[Solved] Content.xml error... what is wrong?

Discuss the word processor
Post Reply
User avatar
Corfy
Volunteer
Posts: 35
Joined: Mon Oct 08, 2007 1:30 am
Location: near Indianapolis, IN, USA

[Solved] Content.xml error... what is wrong?

Post by Corfy »

I downloaded some ODT files from the Internet and tried to open them.

This one file gives me a content.xml format error. The error seems to indicate this line:

Code: Select all

<text:p text:style-name="P1"><text:span text:style-name="T1">Ananian, C. Scott.</text:span> "A Linux Lament: As Red Hat Prepares to Go Public, One Linux Hacker's Dreams of IPO Glory Are Crushed by the Man." Salon magazine, July 30, 1999. <:br> <<text:a xlink:type="simple" xlink:href="http://www.salon.com/tech/feature/1999/07/30/redhat_shares/index.html">http://www.salon.com/tech/feature/1999/07/30/redhat_shares/index.html</text:a>> <:br> "Questions Not to Ask on Linux-Kernel." May 1998. <:br> <<text:a xlink:type="simple" xlink:href="http://lwn.net/980521/a/nonfaq.html">http://lwn.net/980521/a/nonfaq.html</text:a>></text:p>
Specifically, the problem is at column 635, which would coincide with the / in the last "</text:p>".

I don't know enough XML to be able to figure out what the problem is (although I might be able to if the line weren't so complex). I was hoping that someone here might notice what it is and tell me how I might go about fixing it.

There was another file from the same source that also had a content.xml error, but I was able to fix that one. That had to do with the formatting of what it erroneously thought was an email address.

EDIT: It looks to me as if it is getting an unexpected "end text", but I don't know why. From what I can tell, all the start text and end text seem to match up.
Last edited by Corfy on Wed Jan 09, 2008 12:13 am, edited 1 time in total.
---
Laugh at life or life will laugh at you.
OOo 2.4 on K/Ubuntu Hardy Heron 8.04 and WinXP Pro and Home
Repossess your computer!
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Content.xml error... what is wrong?

Post by acknak »

That is some very strange-looking XML (at least to my non-expert eyes). E.g. it contains three instances of "<:br>" which looks very strange to me, not to mention the fact that it is an opening tag that has no matching end. That's why you get the error: there are unclosed tags when the parser gets to the close-paragraph tag.

This looks like a fragment of HTML that has been pasted almost verbatim into the ODF file.

It's also possible that the forum

Code: Select all

 tags have messed it up. They are not really suitable for posting data without change; if you need that, better use an attachment. Save the fragment in a text file and attach it.
AOO4/LO5 • Linux • Fedora 23
User avatar
Corfy
Volunteer
Posts: 35
Joined: Mon Oct 08, 2007 1:30 am
Location: near Indianapolis, IN, USA

Re: Content.xml error... what is wrong?

Post by Corfy »

In HTML, you don't need an end code for BR, so I just assumed the same was true of XML. I told you I didn't know much about XML.

The document contained a grand total of 54 "<:br>", but the first one appeared in that line. So I removed all instances of "<:br>", saved it, rezipped it, renamed it, and the file opened perfectly.

It looks like the problem was from the Bibliography section where they tried to lump three publications from one author into one XML paragraph with linebreaks after each publication. Taking out the linebreaks doesn't bother me a bit. It may or may not conform to official bibligraphy formats, but it is good enough for me.

Thanks for your help.

BTW, from what I can tell, in this case, the CODE handled the text from the XML file perfectly. I copied that and compared it to the original, and it looks exactly letter by letter. So while I understand what you are saying about the CODE tag, in this case at least, it worked perfectly.
---
Laugh at life or life will laugh at you.
OOo 2.4 on K/Ubuntu Hardy Heron 8.04 and WinXP Pro and Home
Repossess your computer!
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: [SOLVED] Content.xml error... what is wrong?

Post by acknak »

It seems a little strange that this should happen, because OOo is perfectly capable of converting HTML into ODF, if the bibliography software asked for that. Seems that maybe something isn't communicating quite right.

If you're into editing content.xml, here's what a paragraph broken into three lines looks like:

Code: Select all

<text:p text:style-name="Standard">
One<text:line-break/>
Two<text:line-break/>
Three.
</text:p>
... the CODE handled the text from the XML file perfectly.
Ok, thanks for checking.

It certainly should handle it, but I know for sure that it plays with spaces sometimes. I also know that the HTML used to present it does not use a "preformatted" wrapper, which means that phpBB has to encode the contents, opening the door for mistakes.
AOO4/LO5 • Linux • Fedora 23
Post Reply