[Tutorial] Format error discovered in sub-document

Forum rules
No question in this forum please
For any question related to a topic, create a new thread in the relevant section.

[Tutorial] Format error discovered in sub-document

Postby Hagar Delest » Sat Jan 28, 2017 7:25 pm

Here is a tutorial to fix documents (mainly .odt) that show the following error message: Format error discovered in the file in sub-document content.xml at position 2,155278(row,col).
The row is always 2 but the column differs depending on your document.
This is based on this post from John_Ha. Other tricks are given along this (long) topic: [Hint] How did I fix my ODT file.

Here is what you see in such case:
Content_01.png
Content_01.png (10.59 KiB) Viewed 1682 times


1. Open the file for the surgery
First make a copy of the file (in case something goes wrong). Then open the file with an archive manager:
  • You can just right click and Open with then select an archive manager
  • Else, change the extension of the file from .odt to .zip
You should now see the content of the file:
Content_02.png
User avatar
Hagar Delest
Moderator
 
Posts: 27661
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

2. Do the surgery

Postby Hagar Delest » Sat Jan 28, 2017 7:26 pm

To edit the file, you need to install an XML editor like:
Then with the editor, open the content.xml file from the archive.
It should warn you that there is indeed a problem with the same row and column position (or 1 col next to it).
Note that at this point, the XML structure is not correct and cannot be formatted by the editor.
The file is then displayed as 2 lines only, the second being a huge one.

Place your cursor at that position:
Content_03.png

You can notice that in this case, there is a "office:name" parameter that is repeated and it doesn't look very logical (yellow highlighting in the picture below).
Thus, delete the string: office:name="__Annotation__765_9324755062" :
Content_04.png

Note that it's not clear if the wrong one is the first or the second declaration.
Complete the process per following steps and if needed, start again and delete the first one this time (hence the advantage of keeping the original file).

Try to identify a text string close to the change. It will be helpful later to check the depth of the resulting changes.
User avatar
Hagar Delest
Moderator
 
Posts: 27661
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

3. Check the result

Postby Hagar Delest » Sat Jan 28, 2017 7:28 pm

Now check if the XML file is correct:
Content_05.png

Note: there may be other errors in the document. In this case do that until none remains.

You can now use the Pretty-print view feature of the editor to format the XML.
In Notepad ++, Go to Plugins > XML Tools > Pretty print XML with line breaks.
It will display with its structure now readable:
Content_06.png

In Notepad++, you have to Linearise the XML before saving it or lots of tabs and newlines will be saved in the file which then appear in the repaired document.
Save the modified content.xml file in the archive.
Close the archive and change its extension back to .odt if you had changed it to .zip.

Your file should now open in Writer.
Check its content, especially the part related to the change made in the content.xml file.
If you had spotted a specific text string close to the position where you applied changes, search for it.
In some cases, significant parts of the content.xml file have to be deleted, it will remove data from the recovered file. You'll have to type them again with their formatting.
User avatar
Hagar Delest
Moderator
 
Posts: 27661
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Fixing docx files with SAXParse error (LibreOffice bug)

Postby Hagar Delest » Sun Jan 29, 2017 1:20 pm

The editing of an XML file can also be useful to fix the bug specific to LibreOffice when saving in .docx (SAXParse error).
See John_Ha's post describing the 3 possible methods: Self-help methods to fix .docx files with SAXParse error.
Note: the 3 methods are alternatives, you don't need to apply all of them!

The 2nd method (explained in this post is the closest to this tutorial. However, a .docx file can have multiple occurrences of this bug. If there are too many of them, the 2 other methods may be quicker in the end.
User avatar
Hagar Delest
Moderator
 
Posts: 27661
Joined: Sun Oct 07, 2007 9:07 pm
Location: France


Return to Writer

Who is online

Users browsing this forum: No registered users and 1 guest