Page 1 of 1
[Solved] Format error discovered in … content.xml
Posted: Mon Aug 19, 2024 5:39 pm
by deryck
I am getting an error message in Open Office 4.1.15 stating “Format error discovered in the file in sub-document content.xmlat 2,8010(row,Col)
Can anyone PLEASE assist with advice?
Re: Urgent need of assistance - error message
Posted: Mon Aug 19, 2024 6:25 pm
by RoryOF
The quickest fix is to upload the file in question to the Forum.
Re: Urgent need of assistance - error message
Posted: Mon Aug 19, 2024 7:02 pm
by keme
As RoryOf says, the best advice for a speedy and secure recovery is to have someone with experience look at it, unless you have had it stored in a file system with file versioning enabled - e.g. OneDrive cloud storage - in which case it may be wise to revert to a previous version first. Make sure you keep the latest version aside, so the reverting does not accidentally overwrite most/all your work.
If the file content cannot be disclosed, e.g. for security/privacy reasons, it may still be worthwhile for you to attempt a manual rescue. It may be possible to resolve by hand, but you need some basic knowledge of the ODF file structure and working understanding of the XML format. It is most likely a tedious job, so consider how much time would be required (and whether it is at all possible) to recreate from scratch instead.
Background
Your document is a "package" consisting of multiple files zipped together. The "content.xml" component is a plaintext file coded in XML. It has two "lines", better thought of as "records". The first record is a "file header" with statements about how the content is formatted (standards, versions). The second record is actual content. The content record consists of xml entities, one within another, containing the textual content and also textual pointers to format/structure building blocks and other types of content (graphics, audio).
Getting started
To work on the file at "bit fiddler" level, you change the filename extension to "zip" so the individual component files are viewable. Do this on a copy of your file, and keep the original untouched. Then unpack the zip so individual components (including the content.xml) are individually editable.
Strategy
The "2,8010" part of the message shows that the error was discovered in the "content record" of the file (line #2), in position 8010 (counting character by character). This does not mean that the cause of error is actually at that position. It may be there, or anywhere before. At position 8010 it has been discovered that the file cannot be sensibly parsed. It may be an XML closing tag out of sequence with a matching open tag, or it may be something more, or less, obvious. This is where "working understanding of XML" requirement kicks in. E.g.: If there are closing tags in this region, look for preceding matching open tags, and ensure that they are properly nested.
See also this page about xml syntax rules.
Good luck!
Re: Urgent need of assistance - error message
Posted: Mon Aug 19, 2024 7:17 pm
by deryck
Ok So how do I upload the file?
Re: Urgent need of assistance - error message
Posted: Mon Aug 19, 2024 7:22 pm
by LastUnicorn
To upload a sample file look below the box you have been typing in to post here. There you will see a button that says Post Reply – click on that button and a new text box will appear. Look below that text box and you will see a tab labelled Attachments. Click on that tab and you will see options to post a file along with your new comment.
Re: Format error discovered in … content.xml
Posted: Mon Aug 19, 2024 8:08 pm
by deryck
files is to big!! 1365kb
Re: Format error discovered in … content.xml
Posted: Mon Aug 19, 2024 8:10 pm
by LastUnicorn
If your attachment is too large then use a file sharing site like
MediaFire or similar. Upload your file to the sharing site then make a link in your reply to the file on the sharing site.
Re: Format error discovered in … content.xml
Posted: Tue Aug 20, 2024 2:05 am
by robleyd
If you wish, you can email the file to me and I'll have a look at it. I've sent you a PM with my email address.
Re: Format error discovered in … content.xml
Posted: Tue Aug 20, 2024 2:40 am
by MrProgrammer
deryck wrote: ↑Mon Aug 19, 2024 5:39 pm
I am getting an error message in Open Office 4.1.15 stating “Format error discovered in the file in sub-document content.xmlat 2,8010(row,Col)
Here is a
link to the repaired file.
| Edit: 2024-08-20: Link disabled for privacy. |
It now opens, though many of your images are anchored to the page and overlap at the top of the document. I don't know if that problem existed prior to my repair.
For any important document it is crucial to ensure you have adequate backups of it in case the file becomes damaged. This repair was easy. We have had other
Format error discovered in … content.xml posts where the file is damaged beyond repair.
I will delete the link to your repaired file after about a week for privacy. Let me know if you want it deleted sooner.
The repair was done with
[Tutorial] Delete duplicate attributes tool and took ten seconds. You should either learn how to use this tool yourself or even better switch to
LibreOffice where the
issue has been fixed.
Attributes read : 6788
Attributes written: 6779
Attributes removed: 9
updating: content.xml
zip warning: Local Entry CRC does not match CD: content.xml
(deflated 88%)
content.xml updated in ~/Downloads/AJHTL_Foster_DeKlerk_Lekaota_2024[39]a.odt
The tool found this in content.xml, where … represents a long string of ones:
<style:style
office:name="__Annotation__137_91089840111…"
office:name="__Annotation__224_91089840111…"
office:name="__Annotation__234_91089840111…"
office:name="__Annotation__240_91089840111…"
office:name="__Annotation__292_91089840111…"
office:name="__Annotation__301_91089840111…"
office:name="__Annotation__344_91089840111…"
office:name="__Annotation__370_91089840111…"
office:name="__Annotation__401_91089840111…"
office:name="__Annotation__555_91089840111…"
style:name="Table1"
style:family="table"
>
The ten repeated office:name tags are the error which prevented your file from opening. The tool deleted nine of them, all but the first one.
<style:style
office:name="__Annotation__137_91089840111…"
style:name="Table1"
style:family="table"
>
If you need assistance with anchoring images first read
[Tutorial] Some useful hints on using images. Then if the image anchoring problem still is not solved open a
new topic to discuss that. This topic concerns the
content.xml problem.
If this solved your problem please go to your first post use the Edit ☐ button and add [Solved] to the start of the Subject field. Select the green checkmark icon at the same time.
| Edit: 2024-09-04: Link to original file removed for privacy. |
Re: Format error discovered in … content.xml
Posted: Tue Aug 20, 2024 7:58 am
by deryck
Gee, thanks very very much!!
Your assistance is much appreciated!
I have downloaded it and opened and checked and all the relevant text I had corrected is intact. Yep I understand the image issue and will definitely look at doing that better.
You may delete when ever you want to, thanks again.
Deryck