Python: Extract Headings from ODT

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
professor_
Posts: 4
Joined: Sat Nov 07, 2009 3:50 pm

Python: Extract Headings from ODT

Post by professor_ »

Hi!

I would like to extract all headings (i.e. text of style "Heading 1") out of a ODT text document.

I know how to open and read ODT files using Python, but:

can anybody point me to some examples or documentation how to search for styles in a text document?


Thanks a lot!

Sabine Lorentz
OpenOffice 3.1 on Ubuntu 9.10
FJCC
Moderator
Posts: 9619
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Python: Extract Headings from ODT

Post by FJCC »

You can search for styles by setting the SearchStyles property of the SearchDescirptor. The SearchDescriptor is described here and searching Text is described here. Here is a simple macro in Basic to find all of the occurrences of the style Heading 1, print the number of ranges found and get the first range using the getByIndex method.

Code: Select all

oDoc = ThisComponent
SDescript = oDoc.createSearchDescriptor
SDescript.setPropertyValue("SearchStyles", True)
SDescript.setSearchString("Heading 1")
FoundRanges = oDoc.findAll(SDescript)
Count = FoundRanges.Count
Print Count
FirstRange = FoundRanges.getByIndex(0)
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
professor_
Posts: 4
Joined: Sat Nov 07, 2009 3:50 pm

Re: Python: Extract Headings from ODT

Post by professor_ »

Thanks a lot, I'll try this.

Now, I tried it and it works. Do you also know how to get the page number on which the found item is?
OpenOffice 3.1 on Ubuntu 9.10
FJCC
Moderator
Posts: 9619
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Python: Extract Headings from ODT

Post by FJCC »

As far as I know ( which isn't much in the case of Writer macros), only the ViewCursor knows about the page number. You can get the page on which a heading is located like this.

Code: Select all

oDoc = ThisComponent
SDescript = oDoc.createSearchDescriptor
SDescript.setPropertyValue("SearchStyles", True)
SDescript.setSearchString("Heading 1")
FoundRanges = oDoc.findAll(SDescript)
PickedRange = FoundRanges.getByIndex(2)
CurrController = oDoc.getCurrentController()
ViewCurs = CurrController.getViewCursor()
ViewCurs.gotoRange(PickedRange, False)  'the False means the cursor does not expand when it moves
Print "Heading is on page " + ViewCurs.Page
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
Post Reply