Page 1 of 1

[Tutorial] Format error discovered in sub-document

PostPosted: Sat Jan 28, 2017 7:25 pm
by Hagar Delest
Here is a tutorial to fix documents (mainly .odt) that show the following error message: Format error discovered in the file in sub-document content.xml at position 2,155278(row,col).
The row is always 2 but the column differs depending on your document.
This is based on this post from John_Ha. Other tricks are given along this (long) topic: [Hint] How did I fix my ODT file.

Here is what you see in such case:
Content_01.png (10.59 KiB) Viewed 11794 times

1. Open the file for the surgery
First make a copy of the file (in case something goes wrong). Then open the file with an archive manager:
  • You can just right click and Open with then select an archive manager
  • Else, change the extension of the file from .odt to .zip
You should now see the content of the file:

2. Do the surgery

PostPosted: Sat Jan 28, 2017 7:26 pm
by Hagar Delest
To edit the file, you need to install an XML editor like:
Then with the editor, open the content.xml file from the archive.
It should warn you that there is indeed a problem with the same row and column position (or 1 col next to it).
Note that at this point, the XML structure is not correct and cannot be formatted by the editor.
The file is then displayed as 2 lines only, the second being a huge one.

Place your cursor at that position:

You can notice that in this case, there is a "office:name" parameter that is repeated and it doesn't look very logical (yellow highlighting in the picture below).
Thus, delete the string: office:name="__Annotation__765_9324755062" :

Edit Apr. 2, 2018: in fact, both "office:name="__Annotation__714_93247550611111"" and "office:name="__Annotation__765_9324755062"" text have wrongly been inserted in the middle of the Style P1 definition and all instances of it need to be deleted. See Re: Format error discovered and Re: [Solved] Read-Error.

Try to identify a text string close to the change. It will be helpful later to check the depth of the resulting changes.

3. Check the result

PostPosted: Sat Jan 28, 2017 7:28 pm
by Hagar Delest
Now check if the XML file is correct:

Note: there may be other errors in the document. In this case do that until none remains.

You can now use the Pretty-print view feature of the editor to format the XML.
In Notepad ++, Go to Plugins > XML Tools > Pretty print XML with line breaks.
It will display with its structure now readable:

In Notepad++, you have to Linearise the XML before saving it or lots of tabs and newlines will be saved in the file which then appear in the repaired document.
Save the modified content.xml file in the archive.
Close the archive and change its extension back to .odt if you had changed it to .zip.

Your file should now open in Writer.
Check its content, especially the part related to the change made in the content.xml file.
If you had spotted a specific text string close to the position where you applied changes, search for it.
In some cases, significant parts of the content.xml file have to be deleted, it will remove data from the recovered file. You'll have to type them again with their formatting.

Fixing docx files with SAXParse error (LibreOffice bug)

PostPosted: Sun Jan 29, 2017 1:20 pm
by Hagar Delest
The editing of an XML file can also be useful to fix the bug specific to LibreOffice when saving in .docx (SAXParse error).
See John_Ha's post describing the 3 possible methods: Self-help methods to fix .docx files with SAXParse error.
Note: the 3 methods are alternatives, you don't need to apply all of them!

The 2nd method (explained in this post is the closest to this tutorial. However, a .docx file can have multiple occurrences of this bug. If there are too many of them, the 2 other methods may be quicker in the end.

Re: [Tutorial] Format error discovered in sub-document

PostPosted: Wed Apr 04, 2018 12:19 am
by John_Ha
Since Hagar's posts above we have had more examples posted to the forum and have been able to investigate the problem in more detail.

It appears that there are (at present) two different problems which require slightly different solutions. The sequence of fixing either problem is: (see full details in Hagar's posts above).

1. Open the file - you will get an error message like the image below. Record the 3309 number - it tells you where the error is located in the file.

error message.png
error message.png (6.34 KiB) Viewed 9254 times

2. Unzip the .odt file and extract the content.xml file.

3. Open content.xml with an XML editor and click in the file until the editor shows that the cursor is at or close to the number (3309) you recorded. You have now found the location of the error. Once you have found the location you may find it easier to "pretty print" the file so it is easier to see what is happening.

4. Repair the error as described for the two cases below. If you pretty printed in Step 3 Linearise the XML (it undoes the pretty printing) before saving the file.

5. Save content.xml.

6. Insert content.xml back into the .odt file.

The .odt file is now repaired.

The two cases are as follows:

Case 1: Multiple added "office:name="__Annotation__714_93247550611111""

These additions appear in the middle of the first style definition in the file and corrupt it. They should not be there so the fix is to delete all occurrences of them so as to restore the style definition.

annotation error.gif
Note that the P1 Style has been corrupted by the addition of the Annotation.
You need to delete ALL occurrences of the annotation until the P1 Style has been corrected.

After making any correction(s) it is sensible to use the editor's XML Syntax Checker to check the XML is grammatically correct. Correct any further errors which are shown to exist.

The additions are associated with comments which are applied to a range of characters. See [Solved] Read-Error, [Solved] Format error discovered and [Solved] Format error discovered in the sub-document for example files with the error and explanations of how the files were fixed. A bug report Issue 127745 - Read Error: Format error discovered ... at n,nnnn (row,col) has been raised so they can be investigated.

Case 2: Repeated attributes such as w:themeShade, w:themeColor and w:cstheme

These repeated attribute definitions can appear anywhere in the file, and can appear multiple times, and in different places in the file. The fix is to find all repeats, and delete only the repeats so as to leave just one occurrence. So, in the example below, delete w:themeColor="accent1" in the red box.

theme colour.gif
When an attribute like w:themeColour is repeated you should delete the repeats and leave just one occurrence.

After making any correction(s) it is sensible to use the editor's XML Syntax Checker to check the XML is grammatically correct. Correct any further errors which are shown to exist.

Whereas these errors can occur in .odt files they also occur in .docx files which have been created or edited by LibreOffice. See [Tutorial] How to fix SAXParse errors in LibreOffice files for full instructions how to fix them.

Why do these errors occur?

We are not sure and investigation is continuing to understand these errors. It is suspected that Case 1 errors are caused when MS Word is used to edit a .odt file and adds a comment attached to a highlighted range of characters. We do not understand how Case 2 errors occur. SAXParse errors are caused by a known LO bug.

Re: [Tutorial] Format error discovered in sub-document

PostPosted: Wed Jun 27, 2018 10:56 am
by John_Ha
Another type of format error has now been observed where a single character in a definition was changed.

Case 3: Single character in a definition is changed

See [Solved] Format error discovered in the file in sub-document where a spreadsheet .ods file was corrupted. The fix was to edit content.xml and change "pable" back to "table".

Clipboard01.gif (8.04 KiB) Viewed 8019 times

Re: [Tutorial] Format error discovered in sub-document

PostPosted: Mon Feb 18, 2019 5:32 pm
by John_Ha
I have undertaken a little more analysis of errors of this kind. I had previously created a bug report for it, namely Issue 127745 - Read Error: Format error discovered ... at n,nnnn (row,col).

I have conducted some more tests and I have now come to the following conclusions:

1. The error seems to arise when an AOO or LO user sends a .odt file (or .docx file if LO) to a person who uses MS Word, where that person adds comments to a range of characters. Note that adding comments does not require Edit > Changes > Record ..., to be switched on but a person adding comments is usually also recording changes so it is a little difficult to separate the two as potential causes.

2. I am now not sure how the corruption happens. Does MS Word corrupt the original .odt file and return a corrupted file to the AOO user? Or does MS Word not corrupt the file, but AOO cannot handle what MS Word sends back, and AOO then corrupts the file?

3. If I correct a corrupted file by deleting the repeated attributes I then get different behaviours with AOO and LO:

When AOO 4.1.6 saves the corrected file under another name, AOO corrupts the corrected file. Hence the new corruption is definitely introduced by AOO.

When LO saves the corrected file under another name, LO does not corrupt the corrected file.

This does suggest that LO may be more stable than AOO when exchanging files with MS Word