Page 1 of 1

[Solved] [Python] Read Writer text paragraph by paragraph

Posted: Wed Aug 07, 2013 1:52 pm
by _savage
I would like to remotely read the content of a loaded document, paragraph by paragraph. After the initial connect to OO

Code: Select all

>>> import uno
>>> local = uno.getComponentContext()
>>> resolver = local.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", local)
>>> context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
>>> desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
>>> document = desktop.loadComponentFromURL("file:///bla.doc", "_blank", 0, ())
>>> cursor = document.Text.createTextCursor()
So far I've gotten to

Code: Select all

>>> document.Text.getString()
but that gives me the whole text plainly.

What I would like to do is go from paragraph to paragraph, query the kind/type of the paragraph, and then read the text for that one paragraph only. Even better, could I noodle through the text of the paragraph and find formatting, like bolding or italic?

Any help is appreciated :) Thanks!

Re: [Python] How to read text from Writer paragraph by parag

Posted: Wed Aug 07, 2013 2:59 pm
by FJCC
This basic code will produce a Portion for every differently formatted section of text in a document.

Code: Select all

oText = ThisComponent.Text
ParaEnum = oText.createEnumeration() 'makes a collection of paragraphs.
While ParaEnum.hasMoreElements()
	Para = ParaEnum.nextElement()
	PortionEnum = Para.createEnumeration()
	While PortionEnum.hasMoreElements()
		Portion = PortionEnum.nextElement()
		Print Portion.String
	Wend
Wend
My oText variable is equivalent to your document.Text. The Python code would look much the same. Of course, you don't want to just print the Portion, but I hope that gets you started

Re: [Python] How to read text from Writer paragraph by parag

Posted: Thu Aug 08, 2013 11:04 pm
by _savage
Thank you, FJCC, that's exactly what I was looking for :bravo:

For completeness' sake, here's the equivalent Python code

Code: Select all

    parenum = document.Text.createEnumeration()
    while parenum.hasMoreElements() :
        par = parenum.nextElement()
        # check par.ParaStyleName here for Heading or body text or any other paragraph styling
        textenum = par.createEnumeration()
        while textenum.hasMoreElements() :
            text = textenum.nextElement()
            # check text.CharPosture and text.CharWeight and other properties here
            print(text.getString())