[Solved] LibreOffice File format error found at SAXParse

Help with installation and general system troubleshooting questions concerning the office suite LibreOffice.
Post Reply
Jo21
Posts: 4
Joined: Wed Nov 30, 2016 12:17 pm

[Solved] LibreOffice File format error found at SAXParse

Post by Jo21 »

Hello,

I encounter a similar problem with a file that I can no longer open, linked to the use of the correction mode.

The file, too voluminous to be posted here, is available here: https://wetransfer.com/downloads/c37edd ... 121/a82eae

If someone has a solution ... it would be great.


Thank you, Jo
OpenOffice 3.1 on Windows Vista
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [Solved] File format error found at SAXParse

Post by RoryOF »

The uploaded file at the above site is not an .odt file - I think it is a retagged .docx. I have fixed some internal errors and on retagging the fix as a .docx it seems to open correctly. Please send me a PM with your email address so I can send it to you.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Solved] File format error found at SAXParse

Post by John_Ha »

Welcome to the forum.

Try this this file where I have attempted to correct it.

Several points:

1 The file_name is a .odt file, but the file is actually a .docx file.

Question 1: Did you rename the file to .docx?

2 As the file is a .docx it was presumably created either with MS Word or with LibreOffice - AOO cannot write .docx files.

Question 2: Are you using LibreOffice? MS Word? Or did your editor use LibreOffice? MS Word? It would be useful to know the history of the file to understand when the corruption occurred.

3 The file has had Edit > Change ..., applied - yet again a problem with Edit > Changes

4 The .docx file showed an error w:themecolor redefined.

I therefore extracted \word\document.xml from the .docx file - the docx file is actually a ZIP file, so I just unZIPped it. I opened it in Notepad++ and searched for w:themecolor and deleted the second instance of w:themeColor="accent1" each time it occurred, leaving the trailing / as below. I then put document.XML back into the .docx file which then opened normally.

Note that it is easier to find the repeated occurrences if you "pretty print" the XML using the XML Tools plugin for Notepad++. If you do, be sure to Linearise the XML before saving it or lots of tabs and newlines will be saved in the file which then appear in the repaired document.
Clipboard01.png
Clipboard01.png (5.28 KiB) Viewed 13867 times

Code: Select all

<w:rPr>
    <w:sz w:val="20"/>
    <w:szCs w:val="20"/>
    <w:highlight w:val="yellow"/>
    <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
    <w:color w:val="5B9BD5" w:themeColor="accent1" w:themeColor="accent1"/>
</w:rPr>
Second instance of w:themeColor=&quot;accent1&quot; deleted each time it occurred, leaving the trailing /<br />This image shows Notepad++ without invoking the &quot;pretty print&quot; add-on.
Second instance of w:themeColor="accent1" deleted each time it occurred, leaving the trailing /
This image shows Notepad++ without invoking the "pretty print" add-on.
Last edited by John_Ha on Tue Jan 03, 2017 4:14 pm, edited 2 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Jo21
Posts: 4
Joined: Wed Nov 30, 2016 12:17 pm

Re: [Solved] File format error found at SAXParse

Post by Jo21 »

Question 1: Did you rename the file to .docx?
Yes, it was an .docx file renamed in .odt

Question 2: Are you using LibreOffice? MS Word? Or did your editor use LibreOffice? MS Word? It would be useful to know the history of the file to understand when the corruption occurred.
Yes, the document was created with MS Word, i open it with MS Word, creating some correction notes. Then i made a mistake, open the file with LibreOffice 5 and adding some new correction notes. After that it was impossible to open the file either with MS Word or LibreOffice.

Thanks !
OpenOffice 3.1 on Windows Vista
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [Solved] File format error found at SAXParse

Post by RoryOF »

For information: simply changing the file extension does not change the internal structure of a file. The internal structure must be changed by using /File /Save As, selecting the new file type from the File type dropdown, with "automatic file name extension" checked.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Solved] File format error found at SAXParse

Post by John_Ha »

Jo21 wrote:Question 1: Did you rename the file to .docx? Yes, it was an .docx file renamed in .odt
As Rory says above the file is a .docx file which has had the .docx qualifier changed to .odt by the Windows Rename command. You should never do that as the file qualifier must agree with what is inside the file. The file qualifier is like the label on a tin of tomatoes: If you take the label off, and replace it with a label saying strawberry jam; the contents of the tin are still tomatoes, not strawberry jam.

If you want to convert a file to a different format (change what is inside the tin and give the tin the correct label) you must open the file with OpenOffice, and then go File > Save As ..., and choose what format to save it as.

SAXParse errors seem to be common in .docx files edited with LibreOffice. Exchanging files between Word and AOO/LO where edits have been recorded with Edit > Change ..., often seem to cause problems.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Jo21
Posts: 4
Joined: Wed Nov 30, 2016 12:17 pm

Re: [Solved] File format error found at SAXParse

Post by Jo21 »

Yes, i know the problems about changing the file extension. But my file can't not longer be opened so it's impossible to save it as...
I would like to know if it's possible to recover the file, in .docx or .odt, with correction notes. After that sure i'll be more carefull when working on a .docx document, renaming it in .odt before !!
Many thanks
Jo
OpenOffice 3.1 on Windows Vista
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [Solved] File format error found at SAXParse

Post by RoryOF »

Download the repaired file using the link (the underlined blue this file) in John_Ha's message above. It opened for me in 17 or 19 pages, about grey headed parrots.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Jo21
Posts: 4
Joined: Wed Nov 30, 2016 12:17 pm

Re: [Solved] File format error found at SAXParse

Post by Jo21 »

Huge, I can open the file. :bravo:

Thank you very much ! :D

Jo
OpenOffice 3.1 on Windows Vista
User avatar
keme
Volunteer
Posts: 3699
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: [Solved] File format error found at SAXParse

Post by keme »

Jo21 wrote:Yes, i know the problems about changing the file extension. But my file can't not longer be opened so it's impossible to save it as...
I would like to know if it's possible to recover the file, in .docx or .odt, with correction notes. After that sure i'll be more carefull when working on a .docx document, renaming it in .odt before !!
Even though you state that you are aware of the problems arising from a changed filename extension, and possibly in spite of RoryOF's and John_Ha's advice above, there is some ambiguity in your statement.

No offense intended. Just to make it clear: Do not change the filename extension on an existing file (using a file manager) to make it open as a different storage format. That file type indicator is only a name, and changing it will not change the actual storage format, or "file type".
You can call your dachshund "horse" as much as you want. It is just a name. It will not make the saddle fit, and raisins may still be lethal.

If you by "Renaming" mean opening the file and then use "Save as", pick a different storage format, and use the default filename extension for that, you are in the clear.
Apache OO 4.1.12 and LibreOffice 7.5, mostly on Ms Windows 10
erica.rama
Posts: 1
Joined: Sat Dec 03, 2016 1:18 am

Re: [Solved] File format error found at SAXParse

Post by erica.rama »

Hello I have the same problem with this file?
Someone could help me? It's very important.
Thnx

https://drive.google.com/file/d/0B72Iju ... sp=sharing
OpenOffice 3.1 on Windows Seven
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Solved] File format error found at SAXParse

Post by John_Ha »

Welcome to the forum.

Try >>> this file <<< The error was the repeated occurrence of w:themeShade. I just deleted the second occurrences of w:themeShade =?? whenever they appeared.

The formatting is very poor in the original file. The most worrying is the repeated use of new line, as opposed to new paragraph. Because new line does not create a new paragraph, you will probably run into the maximum paragraph length of 64k characters per paragraph. Also, there are a lot of multiple tabs - this does make editing very tricky!
 Edit: My apologies - I erroneously added the many tabs and line returns during the repair process as I did not "Linearise the XML" after having "pretty printed" it in the editor to make analysis easier.

Your original file opens in AOO, which seems to ignore the doubled tag, so I merely copied everything, pasted it into an empty document, and saved it. Try >>> this file <<< instead. 
Clipboard02.png
Clipboard02.png (3.38 KiB) Viewed 13960 times
Last edited by John_Ha on Tue Jan 03, 2017 3:57 pm, edited 2 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Coffaro
Posts: 5
Joined: Mon Dec 05, 2016 11:39 am

File format error found at SAXParse

Post by Coffaro »

Hello, using LibreoOffice I saved the doc in DOCX format and now it is coming up* with this error.

File format error found at
SAXParseExeption:'[word/document.xml line 2]: Attribute
w;themeShade redefined
',Stream 'word/document.xml',Line, Column 159269(row,col).

if anyone can fix this corruption, It's a very important document... Please help me
LibreOffice 5.2.3.3. on Windows 8
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: [Solved] File format error found at SAXParse

Post by acknak »

There are detailed instructions in the posts above. viewtopic.php?p=403228#p403228

Or, you can make the document available: attach it here (<128k) or upload the document somewhere and leave a link so we can download it.
AOO4/LO5 • Linux • Fedora 23
Post Reply