Hi!
I would like to extract all headings (i.e. text of style "Heading 1") out of a ODT text document.
I know how to open and read ODT files using Python, but:
can anybody point me to some examples or documentation how to search for styles in a text document?
Thanks a lot!
Sabine Lorentz
Python: Extract Headings from ODT
-
professor_
- Posts: 4
- Joined: Sat Nov 07, 2009 3:50 pm
Python: Extract Headings from ODT
OpenOffice 3.1 on Ubuntu 9.10
Re: Python: Extract Headings from ODT
You can search for styles by setting the SearchStyles property of the SearchDescirptor. The SearchDescriptor is described here and searching Text is described here. Here is a simple macro in Basic to find all of the occurrences of the style Heading 1, print the number of ranges found and get the first range using the getByIndex method.
Code: Select all
oDoc = ThisComponent
SDescript = oDoc.createSearchDescriptor
SDescript.setPropertyValue("SearchStyles", True)
SDescript.setSearchString("Heading 1")
FoundRanges = oDoc.findAll(SDescript)
Count = FoundRanges.Count
Print Count
FirstRange = FoundRanges.getByIndex(0)OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
-
professor_
- Posts: 4
- Joined: Sat Nov 07, 2009 3:50 pm
Re: Python: Extract Headings from ODT
Thanks a lot, I'll try this.
Now, I tried it and it works. Do you also know how to get the page number on which the found item is?
Now, I tried it and it works. Do you also know how to get the page number on which the found item is?
OpenOffice 3.1 on Ubuntu 9.10
Re: Python: Extract Headings from ODT
As far as I know ( which isn't much in the case of Writer macros), only the ViewCursor knows about the page number. You can get the page on which a heading is located like this.
Code: Select all
oDoc = ThisComponent
SDescript = oDoc.createSearchDescriptor
SDescript.setPropertyValue("SearchStyles", True)
SDescript.setSearchString("Heading 1")
FoundRanges = oDoc.findAll(SDescript)
PickedRange = FoundRanges.getByIndex(2)
CurrController = oDoc.getCurrentController()
ViewCurs = CurrController.getViewCursor()
ViewCurs.gotoRange(PickedRange, False) 'the False means the cursor does not expand when it moves
Print "Heading is on page " + ViewCurs.PageOpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.