Hello all, URGENT
I saved a corrupted file in docx. and it is due today for a client. I tried everything, even modifying the source code in visual studio. But i suck in everything related to code.
Please help me out.
Here's a link the file:
https://www.dropbox.com/s/hu2me5fl02cd7 ... .docx?dl=0
Thank you very much in advance for the help!
[Solved] LibreOffice File format error found at SAXParse
[Solved] LibreOffice File format error found at SAXParse
OpenOffice 3.1 on windows 10
Re: [Solved] File format error found at SAXParse
The repaired file is attached. Please check that all content and formatting are as you require.
- Attachments
-
- DAJFR_KEYS1 repaired.docx
- (909.29 KiB) Downloaded 511 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: [Solved] File format error found at SAXParse
Wow! You are an angel! I lost so many hours today trying to fix this problem. I don't know how you did it but it works! Thank you a million times.RoryOF wrote:The repaired file is attached. Please check that all content and formatting are as you require.
OpenOffice 3.1 on windows 10
Re: [Solved] File format error found at SAXParse
You need to report this is a bug with LibreOffice as, until it is fixed, it will continue to happen. See How to Report Bugs in LibreOffice
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: [Solved] File format error found at SAXParse
Could you resume in a few words what was the bug? Because I don't really know how to research nor how to phrase it.John_Ha wrote:You need to report this is a bug with LibreOffice as, until it is fixed, it will continue to happen. See How to Report Bugs in LibreOffice
OpenOffice 3.1 on windows 10
Re: [Solved] File format error found at SAXParse
Report it as:lajeandom wrote:Could you resume in a few words what was the bug? Because I don't really know how to research nor how to phrase it.
Title: SAXParse exception error - multiple occurrences of attribute re-defined in document.xml
When opening the attached file [upload your broken file in the bug report] which was saved by LO as a .docx file, I get the error message [your error message - it will be something like "SAXParseExeption:'[word/document.xml line 2]: Attribute w:themeShade redefined',Stream 'word/document.xml',Line, Column 159269(row,col)"].
Analysis of \word\document.xml shows repeated occurrences of [the attribute being defined twice - something like w:themeShade] as in [sample line of code from your file].
See the thread [Solved] File format error found at SAXParse at viewtopic.php?f=7&t=80923#p373226 which has several examples of .docx files with repeated attributes.
Resolution: Prevent LO writing these attributes twice.
If you upload your broken file to Dropbox I will extract a sample line of code for you to use in your bug report. Let me know if you do not still have your broken file as you could use one of the files in this thread - I will write the words for you.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Self-help methods to fix .docx files with SAXParse error
These problems seem only to arise in LibreOffice created documents. Be sure to work on a copy of the file in case something goes wrong.
Three self-help methods to fix LibreOffice .docx files with SAX parse errors. You only need to use one of them!
1 AOO seems to be able to open these files ...
Download Apache OpenOffice from http://www.openoffice.org/download/index.html. Create a new user on your PC and install AOO for that user only. AOO and LO seem to interact in that LO grabs some of the AOO properties and this will completely isolate AOO from LO. Open the .docx file with AOO. Save it as a .odt file. Uninstall AOO and delete the added user.
2 Follow the directions given at ...
... viewtopic.php?f=101&t=86936&#p403228. This requires you to unzip the .docx file, extract the \word\document.xml file, and remove all the occurrences of the repeated attribute specified in the error message you get when you open the .docx file. Note that there may be more than attribute repeated in the file so you may have to do this for the other repeated attribute(s). Repeated attributes reported here include w:themeShade, w:themeColor and w:cstheme. Files uploaded to this thread have had many (30+?) repeats.
3 Extract \word\document.xml from the .docx file and strip off all the XML tags to leave just the text
Windows:
Rename the file from fred.docx to fred.ZIP.
Double click fred.ZIP.
Navigate to the \word folder.
Drag document.XML onto the desktop.
- Install Notepad++ and the XML Tools plug-in. Open document.xml with Notepad ++. Go Plugins > XML Tools > Pretty print XML with line breaks. Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.
Linux:
Rename the file from fred.docx to fred.ZIP.
Unzip fred.ZIP - you may need to install a ZIP utility on Linux.
Navigate to the \word folder.
Extract document.xml.
- Install an XML editor. Open document.xml with the XML editor and format it "pretty print". Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.
Three self-help methods to fix LibreOffice .docx files with SAX parse errors. You only need to use one of them!
1 AOO seems to be able to open these files ...
Edit: ... but only displays things before the error. Everything after the error is not displayed and, worse, it all gets permanently deleted from the file if you save it so you cannot do the other fixes! |
2 Follow the directions given at ...
... viewtopic.php?f=101&t=86936&#p403228. This requires you to unzip the .docx file, extract the \word\document.xml file, and remove all the occurrences of the repeated attribute specified in the error message you get when you open the .docx file. Note that there may be more than attribute repeated in the file so you may have to do this for the other repeated attribute(s). Repeated attributes reported here include w:themeShade, w:themeColor and w:cstheme. Files uploaded to this thread have had many (30+?) repeats.
3 Extract \word\document.xml from the .docx file and strip off all the XML tags to leave just the text
Windows:
Rename the file from fred.docx to fred.ZIP.
Double click fred.ZIP.
Navigate to the \word folder.
Drag document.XML onto the desktop.
- Install Notepad++ and the XML Tools plug-in. Open document.xml with Notepad ++. Go Plugins > XML Tools > Pretty print XML with line breaks. Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.
Edit: I had a file with about 30 errors and I had to find them manually using Notepad++. I downloaded XML Copy Editor and found it much easier to use as it stepped through the file finding each line with an error. However, XML Copy Editor would not pretty print because of the errors, so I needed to use Notepad++ to pretty print the file which I then saved. I edited the saved file with XML Copy Editor, saved it, and used Notepad++ to re-linearise it. XML Copy Editor missed some errors when using F2 to step through the file. However issuing the pretty command in XML Copy Editor located these errors. |
Rename the file from fred.docx to fred.ZIP.
Unzip fred.ZIP - you may need to install a ZIP utility on Linux.
Navigate to the \word folder.
Extract document.xml.
- Install an XML editor. Open document.xml with the XML editor and format it "pretty print". Delete the XML tags leaving just the text.
- Alternatively, Google pretty print and upload document.xml to a pretty print web site which will format it. Delete the XML tags.
Edit: The easiest way to delete all the XML tags is by using Find and Replace with Regular Expressions. It should work in LO as long as you do not break the character limit for a paragraph (64k in AOO). It works fine in NotePad++. Open document.xml. Pretty print (it needs the XML Tools plugin - if you don't you will end up with a single paragraph). Go Search > Replace ..., with search argument <[^>]+> and replace argument blank. Tick Regular Expressions. Click Replace All. All XML tags are deleted and you are left with just the text. |
Last edited by John_Ha on Tue Oct 12, 2021 6:06 pm, edited 8 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: Self-help methods to fix .docx files with SAX Parse erro
Thanks for all the precious information guys. I will research and post the bug if needed asap. I copy pasted all your post so if this happens again to someone that I know (or myself but I am staying away from docx file now lol) at least I will know how to solve the issue.
OpenOffice 3.1 on windows 10
Re: [Solved] LibreOffice File format error found at SAXParse
That is an extremely wise decision. See [Tutorial] Differences between Writer and MS Word files for why you should always work in and save files as .odt.lajeandom wrote:I am staying away from docx file now
The SAXParse error is a LibreOffice problem, not an AOO problem. I have posted Fixing .docx files with SAXParse error in the LO Forum so that LO users will find the post.lajeandom wrote:if this happens again to someone that I know ...
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.