Page 1 of 1

[Solved] Can't open file: Format error in content.xml

Posted: Tue Jul 26, 2022 2:37 am
by KevBowler300
I'm getting the error:

Read-Error.
Format error discovered in the file in sub-document content.xml at 2,2756481 (row,col).

This is the first time I've ever gotten such a message in years of using OO. It's quite a large spreadsheet (7MB) that I hope can be fixed.

 Edit: Changed subject, was Can't open file because Read-Error 
The important part of the message is second line after the very general "Read-Error" 
-- MrProgrammer, forum moderator 

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 2:51 am
by FJCC
There is a tutorial here about how to fix that. Your error will be different than the one shown there but the basic idea is the same. If you have trouble, you can post the document on a publicly available site such as a cloud drive or a file sharing site.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 4:11 am
by KevBowler300

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 5:39 am
by robleyd
Tried, but the content.xml is too big - over 155 Mb - for any of my text editors to open.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 7:16 am
by MrProgrammer
This is quite a mess:
Found                       Expected
</table:table-cell<         </table:table-cell>       Bad end of tag
table:formula= …"           table:formula="…"         Missing quote in attribute value
<vable:table-cell …>        <table:table-cell …>      Misspelled tag

However after fixing those easy problems, numerous errors remain:
Opening and ending tag mismatch: p line 2 and table-cell
fice:value-type="float" office:value="75"><text:p>75>/text:p></table:table-cell>

Opening and ending tag mismatch: table-cell line 2 and table-row
 table:style-name="ce738" table:number-columns-repeated="11"/></table:table-row>

Opening and ending tag mismatch: table-row line 2 and table
:table-cell table:number-columns-repeated="48"/></table:table-row></table:table>

Opening and ending tag mismatch: table line 2 and spreadsheet
/table:sort></table:database-range></table:database-ranges></office:spreadsheet>

Opening and ending tag mismatch: spreadsheet line 2 and body
table:database-range></table:database-ranges></office:spreadsheet></office:body>

Opening and ending tag mismatch: body line 2 and document-content
ble:database-ranges></office:spreadsheet></office:body></office:document-content

Premature end of data in tag document-content line 2

A tag mismatch means the tags are not nested properly, for example:
Right          Wrong          Wrong
<a>            <a>            <a>
   <b>            <b>            <b>
   </b>           </a>
</a>           </b>           </a>

Since your file has over 5 million tags, it is impractical for me to manually examine them to determine how to fix the tag mismatch problems. I know of no programs which could do that automatically. Some of the mismatches could be simple spelling errors, like vable (problem #3), though I think that is unlikely. Restore your file from a backup. Your operating system may be making backups of files each time they're changed. If not, this is surely a feature all modern operating systems can provide.

robleyd wrote: Tue Jul 26, 2022 5:39 am Tried, but the content.xml is too big - over 155 Mb - for any of my text editors to open.
On my Mac I use tool xmllint to analyze the XML. It runs in a few seconds. I can open the XML in TextEdit to view the details of the analysis, though opening the file takes it about a minute and a half. I used sed to fix the first three problems. It runs in a few seconds. TextEdit, xmllint, and sed are all included with MacOS.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 7:49 am
by FJCC
I got a version that opens in Calc by fixing the first three errors that MrProgrammer found. If the OP sends me a private message with an email address, I will send the file. It is very late for me, so I will not do anything about this for several hours.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 8:19 am
by KevBowler300
FJCC wrote: Tue Jul 26, 2022 7:49 am I got a version that opens in Calc by fixing the first three errors that MrProgrammer found. If the OP sends me a private message with an email address, I will send the file. It is very late for me, so I will not do anything about this for several hours.
Sent you the PM.

MrProgrammer wrote: Tue Jul 26, 2022 7:16 am Since your file has over 5 million tags, it is impractical for me to manually examine them to determine how to fix the tag mismatch problems. I know of no programs which could do that automatically. Some of the mismatches could be simple spelling errors, like vable (problem #3), though I think that is unlikely. Restore your file from a backup. Your operating system may be making backups of files each time they're changed. If not, this is surely a feature all modern operating systems can provide.

robleyd wrote: Tue Jul 26, 2022 5:39 am Tried, but the content.xml is too big - over 155 Mb - for any of my text editors to open.
On my Mac I use tool xmllint to analyze the XML. It runs in a few seconds. I can open the XML in TextEdit to view the details of the analysis, though opening the file takes it about a minute and a half. I used sed to fix the first three problems. It runs in a few seconds. TextEdit, xmllint, and sed are all included with MacOS.
It's got a lot of data I've accumulated over the years, I should probably break it up into multiple files or clear out some old stuff once it's working. It's just weird that I've been using that file for quite some time and never had an issue.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 3:33 pm
by FJCC
I'll note that the characters that were present in the damaged document differed from the correct characters by a value of 2 in the ASCII table. That is

Code: Select all

Present   Corrected
   <           >
 [space]      "
    v          t  
I don't know much about such things, but that suggests to me some flaky memory in your computer.

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 4:57 pm
by MrProgrammer
The basic "Latin-1" characters are in four groups of 32:
   000xxxxx - Control characters like tab and newline
   001xxxxx - Non-letters like period, quote, parentheses, …
   010xxxxx - Upper case letters A to Z and a few others
   011xxxxx - Lower case letters a to z and a few others
The digits are a subset of the second group 00110000 to 00111001.

FJCC wrote: Tue Jul 26, 2022 3:33 pm I'll note that the characters that were present in the damaged document differed from the correct characters by a value of 2 in the ASCII table.
I remember we have seen other cases of character substitution with single-bit errors. For me, Calc is still unable to read the file after I fix these three characters, but I would expect that given the "tag mismatch" errors that remain, so I'm glad you were able to solve those problems.
BitError2.gif
BitError2.gif (32.66 KiB) Viewed 4182 times
 Edit: Image replaced since karolus (below) is of course correct. Haha. I just added space between the "nibbles" of the bits which I had labelled as "Hex". 

Re: Can't open file because Read-Error

Posted: Tue Jul 26, 2022 5:33 pm
by karolus
[Nitpicking] Instead …Hex I would name it …Bin, because its a binary representation[/Nitpicking]

Re: Can't open file: Format error in content.xml

Posted: Tue Jul 26, 2022 9:16 pm
by John_Ha
Do a deep and thorough test of your PC's memory as it looks like a hardware fault in the memory.

Make sure you have a proper independent backup which is not overwritten.

Enable Always make a backup copy in Tools > Options > Load/Save (Properties? on Mac). Be aware the backup file is created/overwritten when you open the file, make a change(s) and then save the file.

Re: Can't open file: Format error in content.xml

Posted: Wed Jul 27, 2022 11:21 am
by KevBowler300
I've now got the file and it's working. Thanks for the help everyone :super: :bravo: