[Solved] [Python] Read Writer text paragraph by paragraph

Java, C++, C#, Delphi... - Using the UNO bridges
Post Reply
_savage
Posts: 187
Joined: Sun Apr 21, 2013 12:55 am

[Solved] [Python] Read Writer text paragraph by paragraph

Post by _savage »

I would like to remotely read the content of a loaded document, paragraph by paragraph. After the initial connect to OO

Code: Select all

>>> import uno
>>> local = uno.getComponentContext()
>>> resolver = local.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", local)
>>> context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
>>> desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
>>> document = desktop.loadComponentFromURL("file:///bla.doc", "_blank", 0, ())
>>> cursor = document.Text.createTextCursor()
So far I've gotten to

Code: Select all

>>> document.Text.getString()
but that gives me the whole text plainly.

What I would like to do is go from paragraph to paragraph, query the kind/type of the paragraph, and then read the text for that one paragraph only. Even better, could I noodle through the text of the paragraph and find formatting, like bolding or italic?

Any help is appreciated :) Thanks!
Last edited by Hagar Delest on Sat Aug 10, 2013 11:12 pm, edited 2 times in total.
Reason: tagged [Solved].
Mac 10.14 using LO 7.2.0.2, Gentoo Linux using LO 7.2.3.2 headless.
FJCC
Moderator
Posts: 9274
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: [Python] How to read text from Writer paragraph by parag

Post by FJCC »

This basic code will produce a Portion for every differently formatted section of text in a document.

Code: Select all

oText = ThisComponent.Text
ParaEnum = oText.createEnumeration() 'makes a collection of paragraphs.
While ParaEnum.hasMoreElements()
	Para = ParaEnum.nextElement()
	PortionEnum = Para.createEnumeration()
	While PortionEnum.hasMoreElements()
		Portion = PortionEnum.nextElement()
		Print Portion.String
	Wend
Wend
My oText variable is equivalent to your document.Text. The Python code would look much the same. Of course, you don't want to just print the Portion, but I hope that gets you started
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
_savage
Posts: 187
Joined: Sun Apr 21, 2013 12:55 am

Re: [Python] How to read text from Writer paragraph by parag

Post by _savage »

Thank you, FJCC, that's exactly what I was looking for :bravo:

For completeness' sake, here's the equivalent Python code

Code: Select all

    parenum = document.Text.createEnumeration()
    while parenum.hasMoreElements() :
        par = parenum.nextElement()
        # check par.ParaStyleName here for Heading or body text or any other paragraph styling
        textenum = par.createEnumeration()
        while textenum.hasMoreElements() :
            text = textenum.nextElement()
            # check text.CharPosture and text.CharWeight and other properties here
            print(text.getString())
Mac 10.14 using LO 7.2.0.2, Gentoo Linux using LO 7.2.3.2 headless.
Post Reply