Inserting an XML format sequence

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Inserting an XML format sequence

Post by RoryOF »

When OpenOffice - in particular Writer - handles text, it inserts XML formatting. If I insert a line of XML code to force a manual format in the middle of a normal text flow, OpenOffice applies its internal XML formatting to my line of code, rendering it ineffective. To take a simple example: if I insert

Code: Select all

<text:line-break/>
[Note: this is only a trivial example for illustration]

Openoffice converts this to

Code: Select all

 &lt;text:line-break/&gt
Also converted would be the characters ' " & if they were used in the desired codeline.

Is there any way (some Escape sequence) to tell OpenOffice not to parse the selected text into XML, but to pass it through unchanged to the file?

Why do I ask? For a project under construction I would like to be able to insert certain XML formatting sequences into the text OpenOffice produces. I can see other, more complex methods of achieving this, but an escape sequence surrounding my codelines would be the simplest.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2779
Joined: Wed Mar 09, 2016 2:40 pm

Re: Inserting an XML format sequence

Post by JeJe »

How are you inserting it? If you insert that string as a string then it stays as it is.

Code: Select all

thiscomponent.currentcontroller.viewcursor.string = "<text:line-break/>"

Doesn't change.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Inserting an XML format sequence

Post by RoryOF »

I would like to insert my codeline "escaped" in some way into a Writer document actually being edited, not built by macro. So ideally something like

text text text [Escape] <text:line-break/> [/Escape] text text text

But I have noted your method (for which, thanks) and will try that later.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2779
Joined: Wed Mar 09, 2016 2:40 pm

Re: Inserting an XML format sequence

Post by JeJe »

If I copy <text:line-break/> from here and paste it into Writer same thing, it doesn't change.

I presume you're copying it from somewhere/something which puts it onto the clipboard as xml not text so it gets pasted as xml?

Edit:

Or dragging and dropping and the same thing is happening?
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Inserting an XML format sequence

Post by RoryOF »

Have a look at content.xml

You'll see

Code: Select all

<text:line-break/>
I want it to remain unchanged.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
MrProgrammer
Moderator
Posts: 4903
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Inserting an XML format sequence

Post by MrProgrammer »

RoryOF wrote:Is there any way (some Escape sequence) to tell OpenOffice not to parse the selected text into XML, but to pass it through unchanged to the file?
I think there is no method to pass XML directly into the document. However perhaps you can use a technique I created to work with hyperlinks in a spreadsheet. There are two types: those created with the =HYPERLINK() function, and those created with the Insert → Hyperlink dialog. I normally use the former, which I call call "dynamic" since the link's name and URL can be manipulated by functions or the user interface, but sometimes I encounter the latter in documents from others. Those can't be manipulated easily, only by the dialog, so I'll call them "static". For example, one wants to change the URL in static hyperlinks to alter some part of the path. There's no good way to do that for dozens of them. I think Writer only offers static hyperlinks.

My idea was to manipulate the XML to change the link into text. I can manipulate the text with spreadsheet functions/features. Then I change the text back into a link. The latter is similar to what you want to accomplish.

To change the link into text, I change < to «, " to …, and > to ». The method depends on having «…» be otherwise unused in the document. I chose «…» because they are easy to type on a Mac, and are visually similar to the XML characters they represent. One can use others, of course.

p='<text:p><text:a xlink:href="([^"]+)">([^<]+)</text:a></text:p>'
r='<text:p>«text:a xlink:href=…\1…»\2«/text:a»</text:p>'
unzip -p file.ods content.xml | sed -Ee "s|$p|$r|g" >content.xml
zip   -m file.ods content.xml

The static hyperlinks are gone and the cells now contain simple text: «text:a xlink:href=…❴URL❵…»❴Text❵«/text:a» where ❴URL❵ and ❴Text❵ represent the URL and text from the static hyperlink. Now I can use Edit → Find & Replace to change all the URL paths at once. Or all the texts at once. I could also change the text into a =HYPERLINK() function call.

To change the text back into a link, I change « to <, … to ", and » to >. This is easier since no parsing is needed. -p pipes the content to tr, which modifies it before creating content.xml. -m moves content.xml into the zip file, that is, the operation is replace and delete.

unzip -p file.ods content.xml | tr '«…»' '<">' >content.xml
zip   -m file.ods content.xml

For your case, I'd insert «text:line-break/» into the document at the desired locations, then use the commands below, since you have no quote to contend with.

unzip -p file.odt content.xml | tr '«»' '<>' >content.xml
zip   -m file.odt content.xml

You will no doubt practice first on a copy of the actual file.

I know it is possible to use a macro to access and manipulate the content of static hyperlinks. However the API is complex, and the method I use here is much easier for me than researching which services/interfaces I would need for a macro. This may be especially true in Writer documents where one may have messy situations like hyperlinks in tables, or in sections, or … which the macro will have to navigate to access them all. For your situation you just need to know that <text:line-break/> is valid XML at the point you want to insert it.
 Edit: I see now that line-break is just an example; there are other XML sequences. But if you substitute «» for <>, °¨ for the two quotation marks, and ∞ for & in your sequence, then use unzip/tr/zip to change them, this should cover all cases. You may need to write & as ∞amp;
 Edit: You may not need a replacement for &. In XML it is used in <, >, &apos;, and ", but you don't need to use those since tr allows you to insert them. 
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Inserting an XML format sequence

Post by RoryOF »

Thank you for your contribution, Mr Programmer. I'll look at it in greater detail tomorrow - I have been working all day on this and some other projects (not computer related) and "sufficient unto the day is the evil thereof"!

I have boiled the problem down to the expansion by the xml parser in OpenOffice of five characters (" ' & < >), each of which is expanded into a literal sequence similar to & (I illustrate with the sequence for &). If I can manage to "escape" these characters, so they are immediately inserted into the file, the OO XML parser does most of the work, and the file(s) I would be writing immediately show the effects I require; after processing by macro is not as elegant a solution, although possible.

I have some more investigation to do into the XML code used by OO, perhaps tomorrow.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2779
Joined: Wed Mar 09, 2016 2:40 pm

Re: Inserting an XML format sequence

Post by JeJe »

I think I'm getting you now... this may be irrelevant to what you what to do but when saving those characters in Writer as rft they're preserved the same in the rtf file produced. I don't know if there's a way to embed a section of rft in an xml file.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Post Reply