[Solved] LibreOffice File format error found at SAXParse

Help with installation and general system troubleshooting questions concerning the office suite LibreOffice.

[Solved] LibreOffice File format error found at SAXParse

Postby lajeandom » Tue Dec 13, 2016 12:14 pm

Hello all, URGENT

I saved a corrupted file in docx. and it is due today for a client. I tried everything, even modifying the source code in visual studio. But i suck in everything related to code.

Please help me out.

Here's a link the file:

https://www.dropbox.com/s/hu2me5fl02cd7 ... .docx?dl=0

Thank you very much in advance for the help!
OpenOffice 3.1 on windows 10
lajeandom
 
Posts: 4
Joined: Tue Dec 13, 2016 12:07 pm

Re: [Solved] File format error found at SAXParse

Postby RoryOF » Tue Dec 13, 2016 3:35 pm

The repaired file is attached. Please check that all content and formatting are as you require.
Attachments
DAJFR_KEYS1 repaired.docx
(909.29 KiB) Downloaded 174 times
Apache OpenOffice 4.1.6 on Xubuntu 18.04.3 (mostly 64 bit version) and very infrequently on Win2K/XP
User avatar
RoryOF
Moderator
 
Posts: 29458
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [Solved] File format error found at SAXParse

Postby lajeandom » Tue Dec 13, 2016 4:18 pm

RoryOF wrote:The repaired file is attached. Please check that all content and formatting are as you require.


Wow! You are an angel! I lost so many hours today trying to fix this problem. I don't know how you did it but it works! Thank you a million times.
OpenOffice 3.1 on windows 10
lajeandom
 
Posts: 4
Joined: Tue Dec 13, 2016 12:07 pm

Re: [Solved] File format error found at SAXParse

Postby John_Ha » Tue Dec 13, 2016 5:33 pm

You need to report this is a bug with LibreOffice as, until it is fixed, it will continue to happen. See How to Report Bugs in LibreOffice
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6773
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Solved] File format error found at SAXParse

Postby lajeandom » Wed Dec 14, 2016 7:05 am

John_Ha wrote:You need to report this is a bug with LibreOffice as, until it is fixed, it will continue to happen. See How to Report Bugs in LibreOffice


Could you resume in a few words what was the bug? Because I don't really know how to research nor how to phrase it.
OpenOffice 3.1 on windows 10
lajeandom
 
Posts: 4
Joined: Tue Dec 13, 2016 12:07 pm

Re: [Solved] File format error found at SAXParse

Postby John_Ha » Wed Dec 14, 2016 1:12 pm

lajeandom wrote:Could you resume in a few words what was the bug? Because I don't really know how to research nor how to phrase it.

Report it as:

Title: SAXParse exception error - multiple occurrences of attribute re-defined in document.xml

When opening the attached file [upload your broken file in the bug report] which was saved by LO as a .docx file, I get the error message [your error message - it will be something like "SAXParseExeption:'[word/document.xml line 2]: Attribute w:themeShade redefined',Stream 'word/document.xml',Line, Column 159269(row,col)"].

Analysis of \word\document.xml shows repeated occurrences of [the attribute being defined twice - something like w:themeShade] as in [sample line of code from your file].

See the thread [Solved] File format error found at SAXParse at https://forum.openoffice.org/en/forum/v ... 23#p373226 which has several examples of .docx files with repeated attributes.

Resolution: Prevent LO writing these attributes twice.


If you upload your broken file to Dropbox I will extract a sample line of code for you to use in your bug report. Let me know if you do not still have your broken file as you could use one of the files in this thread - I will write the words for you.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6773
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Self-help methods to fix .docx files with SAXParse error

Postby John_Ha » Wed Dec 14, 2016 3:00 pm

These problems seem only to arise in LibreOffice created documents.

Three self-help methods to fix LibreOffice .docx files with SAX parse errors. You only need to use one of them!

1 AOO seems to be able to open these files ...

... so download Apache OpenOffice from http://www.openoffice.org/download/index.html. Create a new user on your PC and install AOO for that user only. AOO and LO seem to interact in that LO grabs some of the AOO properties and this will completely isolate AOO from LO. Open the .docx file with AOO. Save it as a .odt file. Uninstall AOO and delete the added user.

2 Follow the directions given at ...

... https://forum.openoffice.org/en/forum/v ... 6&#p403228. This requires you to unzip the .docx file, extract the \word\document.xml file, and remove all the occurrences of the repeated attribute specified in the error message you get when you open the .docx file. Note that there may be more than attribute repeated in the file so you may have to do this for the other repeated attribute(s). Repeated attributes reported here include w:themeShade, w:themeColor and w:cstheme. Files uploaded to this thread have had many (30+?) repeats.

3 Extract \word\document.xml from the .docx file and strip off all the XML tags to leave just the text

Windows:

Rename the file from fred.docx to fred.ZIP.
Double click fred.ZIP.
Navigate to the \word folder.
Drag document.XML onto the desktop.
- Install Notepad++ and the XML Tools plug-in. Open document.xml with Notepad ++. Go Plugins > XML Tools > Pretty print XML with line breaks. Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.

Linux:

Rename the file from fred.docx to fred.ZIP.
Unzip fred.ZIP - you may need to install a ZIP utility on Linux.
Navigate to the \word folder.
Extract document.xml.
- Install an XML editor. Open document.xml with the XML editor and format it "pretty print". Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.

 Edit: The easiest way to delete all the XML tags is by using Find and Replace with Regular Expressions. It should work in LO as long as you do not break the character limit for a paragraph (64k in AOO).

It works fine in NotePad++. Open document.xml. Pretty print (it needs the XML Tools plugin - if you don't you will end up with a single paragraph). Go Search > Replace ..., with search argument <[^>]+> and replace argument blank. Tick Regular Expressions. Click Replace All.

All XML tags are deleted and you are left with just the text. 
Last edited by John_Ha on Tue Oct 17, 2017 9:02 pm, edited 5 times in total.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6773
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Self-help methods to fix .docx files with SAX Parse erro

Postby lajeandom » Fri Dec 16, 2016 10:45 am

Thanks for all the precious information guys. I will research and post the bug if needed asap. I copy pasted all your post so if this happens again to someone that I know (or myself but I am staying away from docx file now lol) at least I will know how to solve the issue.
OpenOffice 3.1 on windows 10
lajeandom
 
Posts: 4
Joined: Tue Dec 13, 2016 12:07 pm

Re: [Solved] LibreOffice File format error found at SAXParse

Postby John_Ha » Fri Dec 16, 2016 1:37 pm

lajeandom wrote:I am staying away from docx file now

That is an extremely wise decision. See [Tutorial] Differences between Writer and MS Word files for why you should always work in and save files as .odt.

lajeandom wrote:if this happens again to someone that I know ...

The SAXParse error is a LibreOffice problem, not an AOO problem. I have posted Fixing .docx files with SAXParse error in the LO Forum so that LO users will find the post.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6773
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK


Return to LibreOffice

Who is online

Users browsing this forum: No registered users and 1 guest