Page 1 of 1

[Solved] LibreOffice File format error found at SAXParse

Posted: Wed Nov 30, 2016 12:46 pm
by Jo21
Hello,

I encounter a similar problem with a file that I can no longer open, linked to the use of the correction mode.

The file, too voluminous to be posted here, is available here: https://wetransfer.com/downloads/c37edd ... 121/a82eae

If someone has a solution ... it would be great.


Thank you, Jo

Re: [Solved] File format error found at SAXParse

Posted: Wed Nov 30, 2016 1:16 pm
by RoryOF
The uploaded file at the above site is not an .odt file - I think it is a retagged .docx. I have fixed some internal errors and on retagging the fix as a .docx it seems to open correctly. Please send me a PM with your email address so I can send it to you.

Re: [Solved] File format error found at SAXParse

Posted: Wed Nov 30, 2016 1:34 pm
by John_Ha
Welcome to the forum.

Try this this file where I have attempted to correct it.

Several points:

1 The file_name is a .odt file, but the file is actually a .docx file.

Question 1: Did you rename the file to .docx?

2 As the file is a .docx it was presumably created either with MS Word or with LibreOffice - AOO cannot write .docx files.

Question 2: Are you using LibreOffice? MS Word? Or did your editor use LibreOffice? MS Word? It would be useful to know the history of the file to understand when the corruption occurred.

3 The file has had Edit > Change ..., applied - yet again a problem with Edit > Changes

4 The .docx file showed an error w:themecolor redefined.

I therefore extracted \word\document.xml from the .docx file - the docx file is actually a ZIP file, so I just unZIPped it. I opened it in Notepad++ and searched for w:themecolor and deleted the second instance of w:themeColor="accent1" each time it occurred, leaving the trailing / as below. I then put document.XML back into the .docx file which then opened normally.

Note that it is easier to find the repeated occurrences if you "pretty print" the XML using the XML Tools plugin for Notepad++. If you do, be sure to Linearise the XML before saving it or lots of tabs and newlines will be saved in the file which then appear in the repaired document.
Clipboard01.png
Clipboard01.png (5.28 KiB) Viewed 13868 times

Code: Select all

<w:rPr>
    <w:sz w:val="20"/>
    <w:szCs w:val="20"/>
    <w:highlight w:val="yellow"/>
    <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
    <w:color w:val="5B9BD5" w:themeColor="accent1" w:themeColor="accent1"/>
</w:rPr>
Second instance of w:themeColor=&quot;accent1&quot; deleted each time it occurred, leaving the trailing /<br />This image shows Notepad++ without invoking the &quot;pretty print&quot; add-on.
Second instance of w:themeColor="accent1" deleted each time it occurred, leaving the trailing /
This image shows Notepad++ without invoking the "pretty print" add-on.

Re: [Solved] File format error found at SAXParse

Posted: Wed Nov 30, 2016 6:30 pm
by Jo21
Question 1: Did you rename the file to .docx?
Yes, it was an .docx file renamed in .odt

Question 2: Are you using LibreOffice? MS Word? Or did your editor use LibreOffice? MS Word? It would be useful to know the history of the file to understand when the corruption occurred.
Yes, the document was created with MS Word, i open it with MS Word, creating some correction notes. Then i made a mistake, open the file with LibreOffice 5 and adding some new correction notes. After that it was impossible to open the file either with MS Word or LibreOffice.

Thanks !

Re: [Solved] File format error found at SAXParse

Posted: Wed Nov 30, 2016 6:39 pm
by RoryOF
For information: simply changing the file extension does not change the internal structure of a file. The internal structure must be changed by using /File /Save As, selecting the new file type from the File type dropdown, with "automatic file name extension" checked.

Re: [Solved] File format error found at SAXParse

Posted: Wed Nov 30, 2016 7:01 pm
by John_Ha
Jo21 wrote:Question 1: Did you rename the file to .docx? Yes, it was an .docx file renamed in .odt
As Rory says above the file is a .docx file which has had the .docx qualifier changed to .odt by the Windows Rename command. You should never do that as the file qualifier must agree with what is inside the file. The file qualifier is like the label on a tin of tomatoes: If you take the label off, and replace it with a label saying strawberry jam; the contents of the tin are still tomatoes, not strawberry jam.

If you want to convert a file to a different format (change what is inside the tin and give the tin the correct label) you must open the file with OpenOffice, and then go File > Save As ..., and choose what format to save it as.

SAXParse errors seem to be common in .docx files edited with LibreOffice. Exchanging files between Word and AOO/LO where edits have been recorded with Edit > Change ..., often seem to cause problems.

Re: [Solved] File format error found at SAXParse

Posted: Thu Dec 01, 2016 12:07 pm
by Jo21
Yes, i know the problems about changing the file extension. But my file can't not longer be opened so it's impossible to save it as...
I would like to know if it's possible to recover the file, in .docx or .odt, with correction notes. After that sure i'll be more carefull when working on a .docx document, renaming it in .odt before !!
Many thanks
Jo

Re: [Solved] File format error found at SAXParse

Posted: Thu Dec 01, 2016 12:11 pm
by RoryOF
Download the repaired file using the link (the underlined blue this file) in John_Ha's message above. It opened for me in 17 or 19 pages, about grey headed parrots.

Re: [Solved] File format error found at SAXParse

Posted: Thu Dec 01, 2016 12:53 pm
by Jo21
Huge, I can open the file. :bravo:

Thank you very much ! :D

Jo

Re: [Solved] File format error found at SAXParse

Posted: Thu Dec 01, 2016 12:53 pm
by keme
Jo21 wrote:Yes, i know the problems about changing the file extension. But my file can't not longer be opened so it's impossible to save it as...
I would like to know if it's possible to recover the file, in .docx or .odt, with correction notes. After that sure i'll be more carefull when working on a .docx document, renaming it in .odt before !!
Even though you state that you are aware of the problems arising from a changed filename extension, and possibly in spite of RoryOF's and John_Ha's advice above, there is some ambiguity in your statement.

No offense intended. Just to make it clear: Do not change the filename extension on an existing file (using a file manager) to make it open as a different storage format. That file type indicator is only a name, and changing it will not change the actual storage format, or "file type".
You can call your dachshund "horse" as much as you want. It is just a name. It will not make the saddle fit, and raisins may still be lethal.

If you by "Renaming" mean opening the file and then use "Save as", pick a different storage format, and use the default filename extension for that, you are in the clear.

Re: [Solved] File format error found at SAXParse

Posted: Sat Dec 03, 2016 1:24 am
by erica.rama
Hello I have the same problem with this file?
Someone could help me? It's very important.
Thnx

https://drive.google.com/file/d/0B72Iju ... sp=sharing

Re: [Solved] File format error found at SAXParse

Posted: Sat Dec 03, 2016 5:23 pm
by John_Ha
Welcome to the forum.

Try >>> this file <<< The error was the repeated occurrence of w:themeShade. I just deleted the second occurrences of w:themeShade =?? whenever they appeared.

The formatting is very poor in the original file. The most worrying is the repeated use of new line, as opposed to new paragraph. Because new line does not create a new paragraph, you will probably run into the maximum paragraph length of 64k characters per paragraph. Also, there are a lot of multiple tabs - this does make editing very tricky!
 Edit: My apologies - I erroneously added the many tabs and line returns during the repair process as I did not "Linearise the XML" after having "pretty printed" it in the editor to make analysis easier.

Your original file opens in AOO, which seems to ignore the doubled tag, so I merely copied everything, pasted it into an empty document, and saved it. Try >>> this file <<< instead. 
Clipboard02.png
Clipboard02.png (3.38 KiB) Viewed 13961 times

File format error found at SAXParse

Posted: Mon Dec 05, 2016 11:44 am
by Coffaro
Hello, using LibreoOffice I saved the doc in DOCX format and now it is coming up* with this error.

File format error found at
SAXParseExeption:'[word/document.xml line 2]: Attribute
w;themeShade redefined
',Stream 'word/document.xml',Line, Column 159269(row,col).

if anyone can fix this corruption, It's a very important document... Please help me

Re: [Solved] File format error found at SAXParse

Posted: Mon Dec 05, 2016 4:45 pm
by acknak
There are detailed instructions in the posts above. viewtopic.php?p=403228#p403228

Or, you can make the document available: attach it here (<128k) or upload the document somewhere and leave a link so we can download it.