[OpenOffice][Writer]Retrieving text from a Writer document

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

[OpenOffice][Writer]Retrieving text from a Writer document

Post by meltigel »

Hello everyone,
maybe I'm posting an obvious question, but how I can retrieve the text of a Writer document?

I have written a basic VB application. My goal is to insert in a Writer document some escape sequences and, when I open this document inside my application (in which there is a table with the correspondence "escape sequence" --> "sentence"), the "macro" needs to read the text in the document, check the table, and substitute the correspondence back in the document.

I have understood that there are two cursors, but the usage is not very clear to me. Someone can explain me how to read from the document, or point me to some guide/tutorial/documents?

Thank you everyone
OpenOffice 4.1.6 on Windows 10 x64
User avatar
RoryOF
Moderator
Posts: 34570
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by RoryOF »

The definitive work on OO macros is downloadable from http://www.pitonyak.org/oo.php

Note that you should forget VB, OpenOffice BASIC is dissimilar at the lower levels one needs for document manipulation.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2756
Joined: Wed Mar 09, 2016 2:40 pm

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by JeJe »

If I'm following you then you want to replace a unique string in a document with a string from elsewhere? If that's it then you can use a search descriptor as follows - this will replace the first occurance of "fish" with "gas"

Code: Select all

Sub Main

mySearch = thisComponent.createSearchDescriptor()
mySearch.searchString = "fish" 
mySearch.searchRegularExpression = false
myResult = thisComponent.findfirst(mySearch)
  
if Not IsNull(myResult) then
myResult.string = "gas"
End If
End Sub
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

RoryOF wrote:The definitive work on OO macros is downloadable from http://www.pitonyak.org/oo.php

Note that you should forget VB, OpenOffice BASIC is dissimilar at the lower levels one needs for document manipulation.
Already saw his work, but I am not able to find what I need there...
OpenOffice 4.1.6 on Windows 10 x64
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

JeJe wrote:If I'm following you then you want to replace a unique string in a document with a string from elsewhere? If that's it then you can use a search descriptor as follows - this will replace the first occurance of "fish" with "gas"

Code: Select all

Sub Main

mySearch = thisComponent.createSearchDescriptor()
mySearch.searchString = "fish" 
mySearch.searchRegularExpression = false
myResult = thisComponent.findfirst(mySearch)
  
if Not IsNull(myResult) then
myResult.string = "gas"
End If
End Sub
The problem is that I have plenty of escape sequences, is not a fixed word... I already knew about the find&replace, but it is not what I need, because I need to evaluate what is written, and replace the correct word in according of this... For example, "fish"->"car", "dog"->"wheel", and so on. Also, the user can add more words, so it is not a fixed thing...
OpenOffice 4.1.6 on Windows 10 x64
JeJe
Volunteer
Posts: 2756
Joined: Wed Mar 09, 2016 2:40 pm

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by JeJe »

If you want the whole text of the document its as simple as

wholetext = ThisComponent.text.string
ThisComponent.text.string = newtext

But the result needs to be small enough to fit into a string variable. You can also enumerate the paragraphs.

Otherwise this page shows you how to use a text cursor:

https://wiki.openoffice.org/wiki/Writer/API/Text_cursor
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

JeJe wrote:If you want the whole text of the document its as simple as

wholetext = ThisComponent.text.string
ThisComponent.text.string = newtext

But the result needs to be small enough to fit into a string variable. You can also enumerate the paragraphs.

Otherwise this page shows you how to use a text cursor:

https://wiki.openoffice.org/wiki/Writer/API/Text_cursor
This code works but, as explained, retrieves ALL the text... There is a way to cycle through all the words and process them one at a time?
OpenOffice 4.1.6 on Windows 10 x64
User avatar
Lupp
Volunteer
Posts: 3535
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by Lupp »

Concerning questions talking unspecifically of text and asking for macro guidance: What is seen as text can have a complex structure. In specific there may be frames, nested frames, frames nested into table cells, (... of tables nested into frames...).
If you can assure your "text" is the body text of the document exclusively, please do so explicitly. "No tables at all" would also be helpful in many cases.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

Lupp wrote:Concerning questions talking unspecifically of text and asking for macro guidance: What is seen as text can have a complex structure. In specific there may be frames, nested frames, frames nested into table cells, (... of tables nested into frames...).
If you can assure your "text" is the body text of the document exclusively, please do so explicitly. "No tables at all" would also be helpful in many cases.
The document is mixed... there is "plain text" and also tables. But I red some documentation and I saw that the table text can be accessed in the same way of plain text. There aren't frames or nested things. Only text and tables.
OpenOffice 4.1.6 on Windows 10 x64
User avatar
Lupp
Volunteer
Posts: 3535
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by Lupp »

Do you need the lookup table for replacements inside the Writer doc? Tables are much easier handled in Calc docs.
Use a replace descriptor as already suggesed and loop through your table. The usage will need to be a bit more elaborate, however, I'm afraid. Anyway a few 100 'Replace All' for the document may be much more efficient (and less error-prone) than simulating code for all of them at a time written in Basic.
The one hardly dispensable condition is: No replacement can create a new "escape sequence" not yet finally processed.

You may find most of the needed information here:
https://api.libreoffice.org/docs/idl/re ... iptor.html
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 2756
Joined: Wed Mar 09, 2016 2:40 pm

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by JeJe »

You can go through each word at a time by creating a text cursor and using gotoNextWord - see the link I posted.

Or you can go through each paragraph in turn with an enumeration - from Andrew Pitonyak's book

Code: Select all

Sub EnumerateParagraphs
REM Author: Andrew Pitonyak
Dim oParEnum 'Enumerator used to enumerate the paragraphs
Dim oPar 'The enumerated paragraph
REM Enumerate the paragraphs.
REM Tables are enumerated along with paragraphs
oParEnum = ThisComponent.getText().createEnumeration()
Do While oParEnum.hasMoreElements()
oPar = oParEnum.nextElement()
REM This avoids the tables. Add an else statement if you want to
REM process the tables.
If oPar.supportsService("com.sun.star.text.Paragraph") Then
MsgBox oPar.getString(), 0, "I found a paragraph"
ElseIf oPar.supportsService("com.sun.star.text.TextTable") Then
Print "I found a TextTable"
Else
Print "What did I find?"
End If
Loop
End Sub
Last edited by JeJe on Fri Jun 14, 2019 1:12 pm, edited 1 time in total.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34570
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by RoryOF »

Set the find and replace strings as a dictionary using Python, then for all words in dictionary replace the text with the definition (the replace) string. Simple to write a small routine to extend the dictionary for new words/replacements.

Sample code given in
https://stackoverflow.com/questions/205 ... dictionary

https://www.daniweb.com/programming/sof ... ext-python
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

Lupp wrote:Do you need the lookup table for replacements inside the Writer doc? Tables are much easier handled in Calc docs.
Use a replace descriptor as already suggesed and loop through your table. The usage will need to be a bit more elaborate, however, I'm afraid. Anyway a few 100 'Replace All' for the document may be much more efficient (and less error-prone) than simulating code for all of them at a time written in Basic.
The one hardly dispensable condition is: No replacement can create a new "escape sequence" not yet finally processed.

You may find most of the needed information here:
https://api.libreoffice.org/docs/idl/re ... iptor.html
The table needs to be stored inside the VB application, because no one has to see and manipulate it. Will try with the enumeration...
OpenOffice 4.1.6 on Windows 10 x64
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

RoryOF wrote:Set the find and replace strings as a dictionary using Python, then for all words in dictionary replace the text with the definition (the replace) string. Simple to write a small routine to extend the dictionary for new words/replacements.

Sample code given in
https://stackoverflow.com/questions/205 ... dictionary

https://www.daniweb.com/programming/sof ... ext-python

Cannot use Python, because I already used VB and need to stick with it
OpenOffice 4.1.6 on Windows 10 x64
User avatar
Villeroy
Volunteer
Posts: 31264
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by Villeroy »

For VB or VBA you need a Microsoft product.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

I know, but for wide distribution I need more compatibility, hence the use of OO Automation...
OpenOffice 4.1.6 on Windows 10 x64
User avatar
RoryOF
Moderator
Posts: 34570
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by RoryOF »

Be aware that OpenOffice code will not work on Microsoft Office, and contrariwise.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
Lupp
Volunteer
Posts: 3535
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by Lupp »

Only on a very low level VBA (for Excel e.g.) can be moved to AOO / LibO ( bit better to LibO). User code relying on the API basically will rarely (next to never) be movable to VBA. If you need VBA, you need MS Office. And that's the purpose of VBA: User lock-in. Though AOO / LibO don't intend this in the same sense as the relevant commercial competitor does, the effect is anavoidably also present the other way.

I made a demo consisting of 3 files:
1) The processing file contaoning the replacement table and the code. (lookupAndProcess.ods)
2) A small text file as the example to work on. (textForProcessing.odt)
3) The result obtained by running the sub 'processIt' from lookupAndProcess.ods. (textForProcessing.odt_rep.odt)
The result also demonstrates what was meant by
Lupp wrote:The one hardly dispensable condition is: No replacement will create a new "escape sequence" not yet finally processed.
You may run the process to verify it - and you my try to port it to VBA, or to get it run by Excel. Lots of fun!
Create an empty folder load the tree files to that folder, inspect the two text files and close them again, open the spreadsheet file and rund the Sub. ...
Attachments
textForProcessing.odt_rep.odt
(27.86 KiB) Downloaded 176 times
lookupAndProcess.ods
(12.76 KiB) Downloaded 171 times
textForProcessing.odt
(27.84 KiB) Downloaded 193 times
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
meltigel
Posts: 9
Joined: Fri Jun 14, 2019 10:19 am

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by meltigel »

I will check on your files soon, and I thank you in advance. So, you're telling me that the things that the Word Automation cannot be accomplished with Writer Automation? Of course I know that the code will be completely different... I ask because in my workplace some coworkers managed to automate Excel/Calc (import/export of some articles), with code differences... I was wondering if the same level of automation can be achieved with Word/Writer, but judging of everyone's answers, it seems no...
OpenOffice 4.1.6 on Windows 10 x64
User avatar
Lupp
Volunteer
Posts: 3535
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by Lupp »

My demo may show that AOO / LibO can do it better. I cannot know since I had no access to MS Oficce for about 15 years now. What I can tell anyway is that there will be no compatibility of automating beyond the most obvious structure of the code if structured at all a bit "top down".
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 2756
Joined: Wed Mar 09, 2016 2:40 pm

Re: [OpenOffice][Writer]Retrieving text from a Writer docume

Post by JeJe »

There seems to be some mixing up of things here? Automating OO or LO is nothing to do with VBA or Microsoft Office. Its controlling OO or LO externally...

https://www.openoffice.org/udk/common/m ... ation.html

I presume that's what you mean? You say VB but its not clear whether you mean VB6, VB.net or some other VB?

I briefly experimented with VB6 and OO automation and it seemed to work fine. You'll need to try what you're trying to do to see whether it will work or not.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Post Reply