[Tutorial] Format error discovered in sub-document

Forum rules
No question in this forum please
For any question related to a topic, create a new thread in the relevant section.

[Tutorial] Format error discovered in sub-document

Postby Hagar Delest » Sat Jan 28, 2017 7:25 pm

Here is a tutorial to fix documents (mainly .odt) that show the following error message: Format error discovered in the file in sub-document content.xml at position 2,155278(row,col).
The row is always 2 but the column differs depending on your document.
This is based on this post from John_Ha. Other tricks are given along this (long) topic: [Hint] How did I fix my ODT file.

Here is what you see in such case:
Content_01.png
Content_01.png (10.59 KiB) Viewed 6636 times


1. Open the file for the surgery
First make a copy of the file (in case something goes wrong). Then open the file with an archive manager:
  • You can just right click and Open with then select an archive manager
  • Else, change the extension of the file from .odt to .zip
You should now see the content of the file:
Content_02.png
User avatar
Hagar Delest
Moderator
 
Posts: 28291
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

2. Do the surgery

Postby Hagar Delest » Sat Jan 28, 2017 7:26 pm

To edit the file, you need to install an XML editor like:
Then with the editor, open the content.xml file from the archive.
It should warn you that there is indeed a problem with the same row and column position (or 1 col next to it).
Note that at this point, the XML structure is not correct and cannot be formatted by the editor.
The file is then displayed as 2 lines only, the second being a huge one.

Place your cursor at that position:
Content_03.png

You can notice that in this case, there is a "office:name" parameter that is repeated and it doesn't look very logical (yellow highlighting in the picture below).
Thus, delete the string: office:name="__Annotation__765_9324755062" :
Content_04.png

Edit Apr. 2, 2018: in fact, both "office:name="__Annotation__714_93247550611111"" and "office:name="__Annotation__765_9324755062"" text have wrongly been inserted in the middle of the Style P1 definition and all instances of it need to be deleted. See Re: Format error discovered and Re: [Solved] Read-Error.

Try to identify a text string close to the change. It will be helpful later to check the depth of the resulting changes.
User avatar
Hagar Delest
Moderator
 
Posts: 28291
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

3. Check the result

Postby Hagar Delest » Sat Jan 28, 2017 7:28 pm

Now check if the XML file is correct:
Content_05.png

Note: there may be other errors in the document. In this case do that until none remains.

You can now use the Pretty-print view feature of the editor to format the XML.
In Notepad ++, Go to Plugins > XML Tools > Pretty print XML with line breaks.
It will display with its structure now readable:
Content_06.png

In Notepad++, you have to Linearise the XML before saving it or lots of tabs and newlines will be saved in the file which then appear in the repaired document.
Save the modified content.xml file in the archive.
Close the archive and change its extension back to .odt if you had changed it to .zip.

Your file should now open in Writer.
Check its content, especially the part related to the change made in the content.xml file.
If you had spotted a specific text string close to the position where you applied changes, search for it.
In some cases, significant parts of the content.xml file have to be deleted, it will remove data from the recovered file. You'll have to type them again with their formatting.
User avatar
Hagar Delest
Moderator
 
Posts: 28291
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Fixing docx files with SAXParse error (LibreOffice bug)

Postby Hagar Delest » Sun Jan 29, 2017 1:20 pm

The editing of an XML file can also be useful to fix the bug specific to LibreOffice when saving in .docx (SAXParse error).
See John_Ha's post describing the 3 possible methods: Self-help methods to fix .docx files with SAXParse error.
Note: the 3 methods are alternatives, you don't need to apply all of them!

The 2nd method (explained in this post is the closest to this tutorial. However, a .docx file can have multiple occurrences of this bug. If there are too many of them, the 2 other methods may be quicker in the end.
User avatar
Hagar Delest
Moderator
 
Posts: 28291
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: [Tutorial] Format error discovered in sub-document

Postby John_Ha » Wed Apr 04, 2018 12:19 am

Since Hagar's posts above we have had more examples posted to the forum and have been able to investigate the problem in more detail.

It appears that there are (at present) two different problems which require slightly different solutions. The sequence of fixing either problem is: (see full details in Hagar's posts above).

1. Open the file - you will get an error message like the image below. Record the 3309 number - it tells you where the error is located in the file.

error message.png
error message.png (6.34 KiB) Viewed 4096 times

2. Unzip the .odt file and extract the content.xml file.

3. Open content.xml with an XML editor and click in the file until the editor shows that the cursor is at or close to the number (3309) you recorded. You have now found the location of the error. Once you have found the location you may find it easier to "pretty print" the file so it is easier to see what is happening.

4. Repair the error as described for the two cases below. If you pretty printed in Step 3 Linearise the XML (it undoes the pretty printing) before saving the file.

5. Save content.xml.

6. Insert content.xml back into the .odt file.

The .odt file is now repaired.

The two cases are as follows:

Case 1: Multiple added "office:name="__Annotation__714_93247550611111""

These additions appear in the middle of the first style definition in the file and corrupt it. They should not be there so the fix is to delete all occurrences of them so as to restore the style definition.

annotation error.gif
Note that the P1 Style has been corrupted by the addition of the Annotation.
You need to delete ALL occurrences of the annotation until the P1 Style has been corrected.

After making any correction(s) it is sensible to use the editor's XML Syntax Checker to check the XML is grammatically correct. Correct any further errors which are shown to exist.

The additions are associated with comments which are applied to a range of characters. See [Solved] Read-Error, [Solved] Format error discovered and [Solved] Format error discovered in the sub-document for example files with the error and explanations of how the files were fixed. A bug report Issue 127745 - Read Error: Format error discovered ... at n,nnnn (row,col) has been raised so they can be investigated.

Case 2: Repeated attributes such as w:themeShade, w:themeColor and w:cstheme

These repeated attribute definitions can appear anywhere in the file, and can appear multiple times, and in different places in the file. The fix is to find all repeats, and delete only the repeats so as to leave just one occurrence. So, in the example below, delete w:themeColor="accent1" in the red box.

theme colour.gif
When an attribute like w:themeColour is repeated you should delete the repeats and leave just one occurrence.

After making any correction(s) it is sensible to use the editor's XML Syntax Checker to check the XML is grammatically correct. Correct any further errors which are shown to exist.

Whereas these errors can occur in .odt files they also occur in .docx files which have been created or edited by LibreOffice. See [Tutorial] How to fix SAXParse errors in LibreOffice files for full instructions how to fix them.

Why do these errors occur?

We are not sure and investigation is continuing to understand these errors. It is suspected that Case 1 errors are caused when MS Word is used to edit a .odt file and adds a comment attached to a highlighted range of characters. We do not understand how Case 2 errors occur. SAXParse errors are caused by a known LO bug.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the Writer guide.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6002
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Tutorial] Format error discovered in sub-document

Postby John_Ha » Wed Jun 27, 2018 10:56 am

Another type of format error has now been observed where a single character in a definition was changed.

Case 3: Single character in a definition is changed

See [Solved] Format error discovered in the file in sub-document where a spreadsheet .ods file was corrupted. The fix was to edit content.xml and change " pable " back to " table ".

Clipboard01.gif
Clipboard01.gif (8.04 KiB) Viewed 2861 times
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the Writer guide.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6002
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK


Return to Writer

Who is online

Users browsing this forum: No registered users and 2 guests