File unreadable / error in content.xml

Discuss the word processor
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

File unreadable / error in content.xml

Post by CannedMan »

Despite having numerous backup, even the backups from previous days fail to open. When opening them in LibreOffice, I get the following error message:
Lesefeil.
Fann formatfeil i fila i underdokumentet content.xml ved 2,136719(row,col).

Reading error.
Found formatting error in the file in the subdocument content.xml at 2,136719(row,col)
When opening it in OpenOffice, I instead get this less helpful message:
Lesefeil.
Feil ved lesing av fil.

Reading error.
Error in reading file.
This is despite trying previous backups of the file. When opening the most recent file, it references 2,136719(row,col), which is the same as when opening the version from yesterday evening and morning, as well as the file from two days ago. Were it just the latest version, it would have made at least a little bit of sense, but that the backups all of a sudden are unreadable as well – that is file versions which I have previously worked on and saved – leaves me clueless as to what is causing this.

Versions:

LibreOffice:
Version: 7.1.1.2 (x64) / LibreOffice Community
Build ID: fe0b08f4af1bacafe4c7ecc87ce55bb426164676
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: nn-NO (nn_NO); UI: nn-NO
Calc: CL

OpenOffice (downloaded today to see if it was just a LibreOffice hiccough
Apache OpenOffice 4.1.10
AOO4110m2(Build:9807) - Rev. b1cdbd2c1b
2021-04-19 19:30

And finally, I forgot to mention, I repaired my LibreOffice installation (in case that might be the problem) and did a clean new installation of OpenOffice; neither of these actions had any effect on the result.
Attachments
Aurēlius Ambrŏsius on the Suicidal Death of Flāvius Valentīniānus.odt
(37.44 KiB) Downloaded 512 times
Last edited by CannedMan on Wed Jul 28, 2021 7:10 pm, edited 1 time in total.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: File unreadable / error in content.xml

Post by Villeroy »

Something is wrong with styles.xml. I don't have the time and nerve to fix it.
If you have some (template) file with the same set of styles, you can try to transplant styles.xml
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

Oh, is that it? So if I paste the styles from the template I used into the document which is dysfunctional, that might solve it?
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

I opened both styles in Notepad++ and ran a compare, and they match. The template I have used, opens up just fine. I attempted pasting the template’s style into the original document, but that (as expected) had no effect.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

I note that the file has many "rsid" (no quotes) tags in it. OpenOffice, as far as I remember, will not handle rsid tags, so best to experiment with LibreOffice only.

content.xml is reported valid by XML Copy Editor. I'll act on Villeroy's posting and check styles.xml - that validates OK.

I don't like the size of styles.xml - it looks to me to be too large, which suggests too many styles in use.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

I cannot understand the XML so I merely recovered the text. The error is where the xxxx I added appears. There was an XML closing tag missing which I added but it did not fix it - it was at the "43 Liverpool ..." text.

Also, there are numerous instances of square box characters in the file which usually means a non-printable character. When I move the cursor across one, the column increases by 1 but the position increases by 3 so I am not sure where I should be looking for the error.
Clipboard01.png
As Rory says, AOO does not support rsid tags (used by MS Word when two documents are me4rged so as to record which words came from which document) so stick to LO.
Attachments
text only.odt
(35.07 KiB) Downloaded 258 times
Last edited by John_Ha on Wed Jul 28, 2021 8:02 pm, edited 1 time in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

I doubt you have full content, John, unless the file is extraordinarily heavily formatted.

In the original file I looked at the error location CannedMan reported in content.xml; the xml code looked OK - there were similar instances in exactly the same form.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

Rory

I think I had the full content but, like you, the syntax was reported as OK.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

Ambrose.png
Here is the XML for the area of the error. The reported location of the error is between name and = "P27" on the fourth line down.

OpenOffice won't open the file, giving an error message. Earlier, in case the accented characters in the file name were causing a problem. I renamed it to Fred.odt, but that made no difference to OpenOffice.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

@John_Ha or CannedMan: please try the attached version, renamed to Fred.odt I have no LibreOffice version running on this computer. I have removed the Ambrose of Milan line, which I think to be a bibliographic reference.
Attachments
Fred.odt
(37.29 KiB) Downloaded 287 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

I think that might be a correct observation; when I looked at the XML, I saw that the error indeed was in the bibliography, in which there is a combination of tables and straight-up text. I will take a look at Fred now.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

Sorry, I got the same read error (2,136396(row,col)). I think the entire bibliography might have to go (but that is no problem at all). I tried to do this myself, but ended up with a completely unreadable file. The styles are not important either; if the information of which styles to use is still present, I can simply re-apply the template.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

John_Ha wrote:I cannot understand the XML so I merely recovered the text. The error is where the xxxx I added appears. There was an XML closing tag missing which I added but it did not fix it - it was at the "43 Liverpool ..." text.

Also, there are numerous instances of square box characters in the file which usually means a non-printable character. When I move the cursor across one, the column increases by 1 but the position increases by 3 so I am not sure where I should be looking for the error.
Clipboard01.png
As Rory says, AOO does not support rsid tags (used by MS Word when two documents are me4rged so as to record which words came from which document) so stick to LO.
I downloaded the contents, and it does appear to be complete. I will have a thorough look at it now.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

Try this file - I think I have removed the entire bibliography, but you should have the content.
Attachments
Aurēlius Ambrŏsius no biblio.odt
(34.08 KiB) Downloaded 293 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

Well, there is a new error, so that's an improvement. Now it finds an error in 2,115502(row,col).
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

John_Ha wrote:Also, there are numerous instances of square box characters in the file which usually means a non-printable character. When I move the cursor across one, the column increases by 1 but the position increases by 3 so I am not sure where I should be looking for the error.
As Rory says, AOO does not support rsid tags (used by MS Word when two documents are me4rged so as to record which words came from which document) so stick to LO.
Does AOO/LO have problems with styles that use non-English characters? Or styles that have long names?

Also, just for clarity: The document has never been touched by Microsoft. I exclusively use LO or if need be AOO (I really would love if AOO could get full OpenType support).
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

CannedMan wrote:
Does AOO/LO have problems with styles that use non-English characters? Or styles that have long names?

Also, just for clarity: The document has never been touched by Microsoft. I exclusively use LO or if need be AOO (I really would love if AOO could get full OpenType support).
Funny characters in file names have (in past on Windows) been known to cause problems.

The rsid tags, which OpenOffice does not support, must then have come from LibreOffice.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

Rory

Both fred.odt (136396) and the other file give errors with LO, the latter at 115502.

As the non printing characters seem to count as either 1 or 3 in Notepad**, I don't know whether AOO/LO count them as 1 or 3, so I am not sure exactly where to look.
Last edited by John_Ha on Wed Jul 28, 2021 9:12 pm, edited 1 time in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

RoryOF wrote:I don't like the size of styles.xml - it looks to me to be too large, which suggests too many styles in use.
Too many styles available or too many instances of styling?
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

John_Ha wrote:Rory

Both fred.odt (136396) and the other file give errors with LO, the latter at 115502.

As the non printing characters seem to count as either 1 or 3 in Notepad**, I don't know whether AOO/LO count them as 1 or 3, so I am not sure exactly where to look.
Before uploading the files, I verified that they validated correctly.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

CannedMan wrote:Also, just for clarity: The document has never been touched by Microsoft. I exclusively use LO or if need be AOO (I really would love if AOO could get full OpenType support).
I really doubt that as the text is butchered the way MS Word butchers text in .docx files; and possibly because it has rsid tags. I suspect you have pasted something which originated as a .docx file even if it was actually .odt file. For example the "43 ..." in the Liverpool was split into "4" and "3..." as separate bits of tagged text. AOO never does that - Word .docx files do it all the time.

When you get a working file use the Styles pop-up and compare Applied (ie used) styles with the total number of styles.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

CannedMan wrote:
RoryOF wrote:I don't like the size of styles.xml - it looks to me to be too large, which suggests too many styles in use.
Too many styles available or too many instances of styling?
Too many styles defined, in my view, but that is just a guess.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

RoryOF wrote:Before uploading the files, I verified that they validated correctly.
The original file validated correctly for me but still gave the error. It's a bit strange ...
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

RoryOF wrote:
CannedMan wrote:
RoryOF wrote:I don't like the size of styles.xml - it looks to me to be too large, which suggests too many styles in use.
Too many styles available or too many instances of styling?
Too many styles defined, in my view, but that is just a guess.
Is there any documentation specifying any specifics on this? I have never come across any specific limitations such as this, but I have experienced this kind of error happen before.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

CannedMan wrote:
Is there any documentation specifying any specifics on this? I have never come across any specific limitations such as this, but I have experienced this kind of error happen before.
Not that I know of.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

@CannedMan: did you run some form of Compare on this file and an earlier version?

All the internal components check out as being in correct XML format, but there are traces of some mechanism having revised or amalgamated the file; both John_Ha and I are baffled, and can do no more.

At least John was able to recover your actual text - you will have to redo the formatting, and I suggest that you should keep Timed/Dated backups every few minutes as you redo, in case of collapse of the newly edited file.
 Edit: I suspect, and this is merely a guess - that your standard template has become corrupt or upset in some way, and would not be surprised if future and existing work using that template exhibited a similar problem. It might become necessary to remake your standard template, redoing your style definitions, and hopefully avoiding introducing the upset we are currently seeing. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

One thing I did notice is that each tracked change edit seems to create a new paragraph? and/or text? style, presumably giving the proliferation of styles.

I was also surprised to see LO uses rsid tags in tracked change paragraph styles but nowhere else. While AOO does not support rsid tags it handled the LO file with them without problems.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: File unreadable / error in content.xml

Post by RoryOF »

@John_Ha: I think we may be looking at the wrong control; LibreOffice Version Control seems to insert rsid tags, which are unique to a Save session (read: as I understand this) and do not indicate which User was responsible for these changes. Track changes, in addition to the markings for Version Control (or similar markings), adds User information to identify responsibility for such changes.

I am now thinking that the rsid markings in styles.xml (Version Control markings?) may be causing the problem. I am now going to search to see if there is any method of cleansing styles.xml of these rsid tags and their related information, leaving styles.xml clean, but containing the same style definitions without history.
 Edit: There were a small number of rsid instances in styles.xml, which I removed by hand. That did not cure the problem. There are so many rsid instances (278) in content.xml that manual removal is not practical.

@CannedMan: I think you should post your file to https://ask.libreoffice.org/en/questions/ as there are volunteers there who have more in depth knowledge of the LibreOffice version of .odt format than on this forum. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
CannedMan
Posts: 225
Joined: Wed Aug 04, 2010 12:06 am

Re: File unreadable / error in content.xml

Post by CannedMan »

You guys are just awesome.

I ended up realising yesterday evening before going to bed that the best approach was to use the raw text and reformat it; that didn’t take too much time. I then split the bibliography and the raw text into two separate files. I am very happy that I had a backup of my style template, as that failed with the same error today; I opened the backup and then used that to format the two new files, then created a master document from these. Hopefully that will solve the issue. I will most definitely check whether there are any remnants of MS Word’s joss og mannskit (grrrrr!) hidden in the template somewhere, and maybe redo that from scratch at some point in the future. I am now making regular backups to a different destination (not just relying on Google Drive’s version backlog) to have redundancy beyond what should be deemed reasonable.

The main issue, rescuing the text, was saved early on. It seems that the rsid’s have to do with tracking changes, right? I did switch that on by error at some point during the history of the file. What I would really like to learn, is whether there are any hard limits on the number of styles a document can have and how many instances of style application are possible in a document before it collapses, but given the replies by @RoryOF above, that seems to be difficult to gauge.

As to asking the LO community, well, maybe I should do that, but I always felt at home here, and as some of you may know, I have used AOO for a number of years, all the way until OpenType support was announced for LO; it really is the only reason I switched.
Apache OpenOffice 4.1.5 / LibreOffice 7.0.0.3 on Windows 10 (x64)
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: File unreadable / error in content.xml

Post by John_Ha »

Rory - you are right - rsid stands for Revision Identifier for Style Definition. See Rsid Class.

I discovered LO uses rsid tags for style changes and, while AOO doesn't, AOO handles an LO file with rsid tags.
Last edited by John_Ha on Thu Jul 29, 2021 1:30 pm, edited 2 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply