Page 1 of 1

Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 12:25 pm
by konstantinos
Hi all, I have this problem when trying to open my file, unfortunately I realized that I should always save my work in odt alas the damage is done now so I wonder if anyone can help me fix my document a it is for academic publication. Lesson learned...

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 12:39 pm
by FJCC
Repairing the document requires editing the XML content. There are a tutorial about this here. You can also post the document here or share it via dropbox/google drive/Mediafire or email it to someone. I can send you my email address via private message if you are interested in that.

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 12:53 pm
by John_Ha
Your file is almost certainly repairable and all content will be recovered. For an explanation of SAXParse errors and how to fix them see [Tutorial] How to fix SAXParse errors in LibreOffice files

Alternatively send the file to FJCC - use the email button next to his post.

See the tutorial - if you have opened the broken file, and then saved it with the same name so you overwrote the original file, then all truncated text will have been lost. If you have done so your only hope is to see [Tutorial] How to find and un-delete Writer temporary files for

a) detailed instructions on how to recover your file as it was when you last opened or saved it, or as it was when it was last saved with AutoRecovery;

b) how to find previous versions of the file in the folder it is located in, but which have since been deleted;

c) how to un-delete the temporary files Writer wrote while you were editing the file, and then deleted. This will recover your file as it was when you last opened or you last saved it and is probably your best hope. As it was a .docx file follow the instructions for recovering a .odt file.

See [Tutorial] Differences between Writer and MS Word files for why you should always work in and save files as .odt.

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 5:22 pm
by FJCC
I got a copy of the file from the OP. It opens in OpenOffice without complaint but many of the images are missing. LibreOffice complains as follows
Screen Shot 2018-07-02 at 16.25.53.png

I cannot find the problem in the xml code. Can someone else take a look?

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 5:34 pm
by John_Ha
FJCC

I have sent you a PM with my email ID. Send me the file and I will look at it.

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 7:42 pm
by John_Ha
I have inspected the file.

The file is a .docx file and uses MS Word "not part of the OOXML Standard" text boxes. AOO does not support Textboxes (LO does) so textboxes and items within them do not display in AOO.

The file opens with OpenOffice and displays 23 pages where the page 23 is as below. There is substantial content below Page 23 in the XML file. "triangle" is at line 6,745 in NotePad ++ (pretty printed).

last page.gif

There are numerous <mc:AlternateContent> tags which are Microsoft's way of saying "The following content is not part of the OOXML standard". AOO does not understand anything between <mc:AlternateContent> tags. There are several occurrences before line 6,745 (the first occurs at line 4,129) so some content is not being displayed by AOO in the 23 pages I can see.

When I check the XML syntax NotePad++ gives the following error which I do not understand. The error "seems to be about where I would expect it" which is after the last thing I can see.

error.gif
error.gif (7.52 KiB) Viewed 846 times

I think therefore there are two options.

1. I remove all the XML tags from document.XML. This will leave just the text content - at least the text will be saved.

2. I have posted the code causing the problem below to see if anyone can identify the error in the XML. The tags look a valid pair. If that error can be fixed then the .docx file should open properly, with all content visible, in LO. Unfortunately I cannot find a similar set of tags to use as a template to make a correction. I am fairly certain that the " pic " is incorrect in both tags but I do not know what to replace it with.

Re: Missing large part of text due to error SAX

PostPosted: Mon Jul 02, 2018 7:54 pm
by John_Ha
The text only is attached.