XML advice

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

XML advice

Post by RoryOF »

I'm putting this query here, under Macros and UNO API because it may be used in writing a complex Macro/Extension, although it could fit in other locations on the Forum.

I am considering writing a major extension for Writer which would use two formats - one for ease of editing, the other for final printing; it would build a complex final file structure. To permit later afterthoughts, I would like it to be able to switch back to its setup stage and recreate that, permitting modification.

My question is this: are there defined "private tags" in XML, which I can use to mark internal structures - something like this
<privatetag : block1> text text text and other valid XML tags </privatetag : block1>
tags which the OO parser will suppress on output.

I've looked into XML documentation but am drowning in it (and other matters - I'm in the 14th century and the 18-19th centuries on other researches). A pointer to such tags or the XML definition section that deals with them would be much appreciated

The idea is that I can mark my internal structures to permit switching between the setup display and the editing display. I can write code to insert, locate and remove such tags, also to process other existing structures I will be using based on the presence or absence of such private tags - what I want to know is if XML has a readymade set waiting for me to use?

Rory
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: XML advice

Post by John_Ha »

Rory

Unless there are AOO pre-defined private tags as you request I don't think it will work. I did some tests some time ago.

1. Edit content.xml to add data between matching private tags. See "<private-tag>This text is between private tags</private-tag>" four lines from the end.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:textooo="http://openoffice.org/2013/office" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" office:version="1.2">
	<office:scripts/>
	<office:font-face-decls>
		<style:font-face style:name="Mangal1" svg:font-family="Mangal"/>
		<style:font-face style:name="Times New Roman" svg:font-family="&apos;Times New Roman&apos;" style:font-family-generic="roman" style:font-pitch="variable"/>
		<style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss" style:font-pitch="variable"/>
		<style:font-face style:name="Mangal" svg:font-family="Mangal" style:font-family-generic="system" style:font-pitch="variable"/>
		<style:font-face style:name="Microsoft YaHei" svg:font-family="&apos;Microsoft YaHei&apos;" style:font-family-generic="system" style:font-pitch="variable"/>
		<style:font-face style:name="SimSun" svg:font-family="SimSun" style:font-family-generic="system" style:font-pitch="variable"/>
	</office:font-face-decls>
	<office:automatic-styles/>
	<office:body>
		<office:text>
			<text:sequence-decls>
				<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
			</text:sequence-decls>
			<text:p text:style-name="Standard">Test</text:p>
			<private-tag>This text is between private tags</private-tag>
		</office:text>
	</office:body>
</office:document-content>
2. Replace content.xml into the .odt file.

The file now opens with AOO without any problems.

3. Make a trivial edit and save the file. The private tags and the data they hold do not survive the open?save? - they no longer appear in content.xml.

I think this is because AOO has an XML checker which is applied to content.xml either (or both) when it is opened or saved. Any tags not recognised and the data they hold are deleted.

Would a workaround be to use a Comment field? Or other field? Or a hidden section? Or somehow butcher a style definition to hold the data? Or link to an OLE object containing the data?
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: XML advice

Post by RoryOF »

I don't wish to use existing OO structures, as they may be used in the main file and confusion might arise between their use in the file and in the background. I already use a complex Comment macro to steer my editing. As my initial thoughts are to build the file using Sections, hidden sections do not look like a suitable candidate. It might be possible to use a Master Document structure, where each area of the constructed file would be a separate sub-file in the Master Document. I've never had to use Master Documents in Writer editing (even with 3 million word files), so that would be a new exploration for me.

I'll wait and see if there are other contributions, while brooding about other possible approaches.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2764
Joined: Wed Mar 09, 2016 2:40 pm

Re: XML advice

Post by JeJe »

What's the difference between the editing and printing version? If its things like font, margins and so on then you can do this with different paragraph styles. One set of styles for editing and a complementary set for printing and run through the document replacing one style with the other. Or alternatively a single set of styles and change the characteristics of the styles for printing. Styles can also be hidden so you can have information you can hide/unhide by setting whether the syle is hidden or not.

As its a Writer document you can also put any information you want in a custom property. I guess you could also have a separate text file to store anything else you want in it and have your document open event look for the accompanying file.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: XML advice

Post by RoryOF »

No, it's not font, margins etc. It is marking areas of text that will be in different structures between construction, editing and printing.

I'll give you an example: If I have a Section in an OO file, and remove that Section, leaving its content in the main text, I would like to leave tags of some type that allowed some macro code to recreate that Section. There might be 30 or more such Sections.
 Edit: I'll phrase my request a different way: are there hidden markers I could use in a Writer text, which OO won't display or remove, which I will write macro code to manipulate? In order not to interfere with Writer's editing functionality, I do not wish to use Writer's existing markers such as bookmarks, cross references, comments etc. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2764
Joined: Wed Mar 09, 2016 2:40 pm

Re: XML advice

Post by JeJe »

How about bookmarks to mark the beginning and end of the section... you can differentiate between your tag bookmark and your other bookmarks by the name. Eg start the name with Ztag or ZZ - then they'll also appear at the end of the bookmark list. You can also use the bookmark name to store other information - there are some restrictions on characters you can use - but bookmark names look like they can be really long.

Or you could mark the beginning and end of the section with a character style called tag or something and make that style hidden and put whatever info you want in that style. Easy to hide/show tags by changing the hidden property of the style.
Last edited by JeJe on Mon Feb 05, 2018 2:39 pm, edited 1 time in total.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2764
Joined: Wed Mar 09, 2016 2:40 pm

Re: XML advice

Post by JeJe »

I didn't see your edit of the post before that last post.

The difficult bit sounds like protecting some text so it can't be deleted inadvertently when you can't see it... but you can have keyboard and mouse listeners and code to protect whatever. I don't see why a character style that is hidden doesn't do what you want?
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: XML advice

Post by RoryOF »

Thanks; your thoughts are helpful. Another method might be to generate a Master Document, where each of my Sections is a component file of the master document and will remain a separate file, but linked by the Master Document structure.

There is no urgency on this - it is a project I'm slowly thinking about, which _I_ don't need; it may be over complex for the target audience.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2764
Joined: Wed Mar 09, 2016 2:40 pm

Re: XML advice

Post by JeJe »

You're reminding me of the Organon extension, if you haven't seen that:

https://github.com/XRoemer/Organon
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: XML advice

Post by John_Ha »

RoryOF wrote:I'll phrase my request a different way: are there hidden markers I could use in a Writer text, which OO won't display or remove, which I will write macro code to manipulate?
Could you use a non-displaying text character? See Unicode control characters.

A quick and dirty workaround would be to use (1 point?) text set to colour white which would be invisible to the viewer.

Or set Style 100 for a character / paragraph which is identical to the actual style being used at the position you want Marker 100 to be.

A hidden section still seems to be a good idea - can you not work with it?
Last edited by John_Ha on Mon Feb 05, 2018 3:37 pm, edited 1 time in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: XML advice

Post by RoryOF »

I must look at Organon again. I did advise the author in its early stages.

Another similar project, not now supported, is
AuthorSupportTool
also at
https://extensions.openoffice.org/en/pr ... upporttool
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: XML advice

Post by John_Ha »

See hidden section.odt below where there is a section called Bill, with the text Bill, between the two Freds. It is completely invisible to the user until she goes Format > Sections ..., and unhides it.
Clipboard01.png
Clipboard01.png (3.82 KiB) Viewed 3569 times
The Unicode control character U+2422 is non printing but I cannot see a way to add it with Insert > Special character in TNR.

Help is useful ...
Hiding Text
You can use fields and sections to hide or display text in your document if a condition is met.
Before you can hide text, you must first create a variable to use in the condition for hiding the text.
To Create a Variable
1. Click in your document and choose Insert - Fields - Other.
2. Click the Variables tab and click "Set Variable" in the Type list.
3. Click "General" in the Format list.
4. Type a name for the variable in the Name box, for example, Hide.
5. Enter a value for the variable in the Value box, for example, Hide.
6. To hide the variable in your document, select Invisible.
7. Click Insert and Close.
To Hide Text
1. Click in the document where you want to add the text.
2. Choose Insert - Fields - Other and click the Functions tab.
3. Click "Hidden Text" in the Type list.
4. Enter a statement in the Condition box. For example, using the variable you previously defined, enter Hide==1.
5. Type the text that you want to hide in the Hidden text box.
6. Click Insert and Close.
To Hide a Paragraph
1. Click in the paragraph where you want to add the text.
2. Choose Insert - Fields - Other and click the Functions tab.
3. Click "Hidden Paragraph" in the Type list.
4. Enter a statement in the Condition box. For example, using the variable you previously defined, enter Hide==1.
5. Click Insert and Close.

You must enable this feature by removing the tick mark from the View - Hidden Paragraphs menu. When the tick mark is present, you cannot hide any paragraph.

To Hide a Section

1. Select the text that you want to hide in your document.
2. Choose Insert - Section.
3. In the Hide area, select Hide, and then enter an expression in the Condition box. For example, using the variable you previously defined, enter Hide==1.
4. Click Insert.
Related Topics
Conditional Text
Querying User Data in Fields or Conditions
Displaying Hidden Text
Creating Non-printing Text
Insert - Fields - Other
Insert - Section
List of Operators
hidden section.odt
(9.01 KiB) Downloaded 108 times
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: XML advice

Post by John_Ha »

I have worked it out.

See non printing characters.odt.

I opened Notepad++ and went Edit > Character panel. This opened a panel with 255 characters a number of which did not display. I copied a few non displaying ones and pasted them into a document and saved it as non printing characters.odt. As you can see, only the " # " displays, which I think was ACK, but content.xml shows the other hidden characters (DCS, DCS, SS3, RI and HOP) are in the XML.

You can easily add them to any text by setting up an AutoCorrect where, say, %%M1 is replaced by, say, DCS, which you paste into the Replace box. Typing %%M1 now gets your Marker_1 inserted as DCS. Or use AutoHotKey to do the same thing.

EDIT: I could see the markers in the code below when I wrote the post (and if I edit the post) ... but as they are non-printing, they don't display here either :knock: You will have to extract content.xml to see them, or copy from ">" to "#" to copy them.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:textooo="http://openoffice.org/2013/office" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" office:version="1.2">
	<office:scripts/>
	<office:font-face-decls>
		<style:font-face style:name="Mangal1" svg:font-family="Mangal"/>
		<style:font-face style:name="Times New Roman" svg:font-family="&apos;Times New Roman&apos;" style:font-family-generic="roman" style:font-pitch="variable"/>
		<style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss" style:font-pitch="variable"/>
		<style:font-face style:name="Mangal" svg:font-family="Mangal" style:font-family-generic="system" style:font-pitch="variable"/>
		<style:font-face style:name="Microsoft YaHei" svg:font-family="&apos;Microsoft YaHei&apos;" style:font-family-generic="system" style:font-pitch="variable"/>
		<style:font-face style:name="SimSun" svg:font-family="SimSun" style:font-family-generic="system" style:font-pitch="variable"/>
	</office:font-face-decls>
	<office:automatic-styles/>
	<office:body>
		<office:text>
			<text:sequence-decls>
				<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
				<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
			</text:sequence-decls>
			<text:p text:style-name="Standard">#</text:p>
		</office:text>
	</office:body>
</office:document-content>
Clipboard03.png
Attachments
non printing characters.odt
(8.77 KiB) Downloaded 127 times
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply