Page 1 of 1
[Solved] Masking & and ß in content.xml
Posted: Tue May 04, 2010 4:32 pm
by quietschie
Hi everybody,
i'm actually using templates to generate documents automatically out of a java-application.
i've tried to mask the character '&' with several methods, but they all failed: & & \&
the one thing that didn't fail was <![CDATA[ & ...
so the & works now,
but the file itself seems to be corrupted, and openOffice tells me that there is an error at an position (there stands a 'r' ???) and after i open the xml-file in a texteditor and change the encoding type to UTF-8 and actualise the archive, it works.
But the character 'ß' is now replaced by '?' (i'm from germany and yes, we use this
My questions are now, why do i have to change the encoding type? (has this to do with the way i access the .jar / .ott // but it works for other data without '&')
The other question is, why doesn't the <![CDATA[ .. mask the ß ? ( i'm not sure, but i think ß worked without <![CDATA[ when there was no & in the data)
Can anyone help me with this...i'm going slightly mad
Thanks for any help
Quietschie
Re: masking & and ß in content.xml
Posted: Tue May 04, 2010 4:44 pm
by acknak
As far as I know, the only characters in your text that you should have to worry about are: &, ", ', <, and >. Those can all be represented by the predefined xml entities: &, ", ', < and >.
Any other characters must be valid in whatever encoding your xml header specifies, but otherwise there are no restrictions I can think of.
It might be easier if you can attach the document (or xml) that's causing the errors. You can use the "Upload Attachment" link (below the message entry area after you click "POST REPLY").
[Forum] How to attach a document here
Re: masking & and ß in content.xml
Posted: Tue May 04, 2010 5:16 pm
by quietschie
Hi,
i was not right, with my guessing, just tested this again.
The other question is, why doesn't the <![CDATA[ .. mask the ß ? ( i'm not sure, but i think ß worked without <![CDATA[ when there was no & in the data)
the header in my xml file says that it's UTF-8 encoded and i changed the encoding in my Java application also to UTF-8 with
Code: Select all
System.setProperty( "file.encoding", "UTF-8" );
In my opinion, that should do the same thing as opening in a text-editor and save it after changing the encoding to utf-8. (after that, the file is no longer corrupted)
I'm also not sure whose property the encoding is? the one of the document or of the system or of the program opening the document? i'm confused.
Re: masking & and ß in content.xml
Posted: Tue May 04, 2010 8:28 pm
by acknak
The ß character in your document is not valid UTF-8; it's probably Latin-1/ISO-8859-1 (my guess). The character is present as a single byte with value 0xdf. If it were valid UTF-8, it should be encoded as two bytes: 0xC3 0x9F.
I have no idea how you would handle that in Java.
Have you considered using the ODF-DOM package for generating ODF documents from Java? As I understand it, that should take care of all these low-level details for you.
http://odftoolkit.org/projects/odfdom/pages/Home
Re: masking & and ß in content.xml
Posted: Tue May 04, 2010 10:05 pm
by quietschie
Thanks for the hint...somehow i didn't consider to look up the bytes. I'll be trying to replace the bytes of ß with the unicode-bytes for ß.
I didn't think of the package you mentioned. But I don't think, that it could get easier than my current solution:
I'm directly accessing the contents.xml in the ott, reading the file to a String, manipulate it and write it back.
I think thats the easiest and most flexible way, because the users can change the ott directly through oO.org
I'll try something with the bytes and then post my solution, if i find one.
Thanks a lot.
Quietschie
Re: Masking & and ß in content.xml
Posted: Tue May 04, 2010 10:52 pm
by acknak
Re: Masking & and ß in content.xml
Posted: Wed May 05, 2010 10:05 am
by quietschie
Hi,
I tried it out, and it works fine with just a small 4-liner
Code: Select all
public String escapeAll(String s) {
s = "<![CDATA[" + s + "]]>";
byte[] b = new byte[] {(byte) 195, (byte) 159};
s = s.replaceAll("ß", new String(b));
return s;
}
Thanks again for the right hint!