Page 1 of 23

[Hint] How did I fix my ODT file

Posted: Thu Jan 10, 2008 2:41 am
by ajut001
Hello,

I had problem with 20 pages long ODT file (text and pictures). Problem was: when I tryed to open it,
I got message "Error reading file" under OO 2.3.1 (both linux and windows versions).

It took some hours to figure out, how to fix it, so I want to share my solution with other OO users :)

First, let's call our nonopening ODT file as "bad.odt".
  1. make backup FIRST -> "$ cp bad.odt bad_original.odt"
  2. make new directory-> "$ mkdir repair"
  3. copy bad.odt to repair directorty "$ cp bad.odt repair"
  4. change default directory to repair -> "$ cd repair"
  5. unzip bad.odt -> "$ unzip bad.odt"
  6. after unzipping you get bunch of files and directory's under repair , find content.xml and open it whit your favorite text editor -> "$ kate content.xml"
  7. use "find" function to find out, if you have XML tag "<office:automatic-styles>" (somewhere at the beginning of document) and XML tag "</office:automatic-styles>" (somewhere, middle of document). If you have, then delete them and all data between them. Be sure, that you don't delete more or less!
  8. save content.xml (keep original name and place!)
  9. zip extracted data back to one ODT document -> "$ zip -r ./bad_repaired.odt ./*"
  10. try to open repaired document -> "$ ooffice ./bad_repaired.odt"
... and if are you lucky, then OO is able to open your document again ;)

Well, I got back my text and pictures, but the price was - no styles (font size; bold; heading etc...)

If your document do not opening and you get message like "Format error discovered in the file in sub-document context.xml at ...",
then you broke XML structure and must go back to "STEP ONE" ant try to be more careful with deleting things.

PS1: if you get CRC errors, when unzipping ODT, then my solution probable can't help you :(
PS2: I tried also insert "bad.odt" into new document, but still got "Error reading file" message :(
PS3: and " META-INF/manifest.xml file " trick did not help also :(

If someone what to investigate my broken ODT file, then it can be downloaded from -> http://adsl213.pointclark.net/Eksam.odt

BR,
Ajut

Re: [Hint] How did I fix my ODT file

Posted: Thu Jan 10, 2008 5:45 am
by TerryE
Note that I changed the title of your post. (admins can so that :-P)

What Ajut describes is that it is possible to create a ODT file (due to bugs in writer) which may not then load back into OOo. This is often due to invalid styles and that by going into the ODT with a ZIP tool and editing the raw XML content.xml as he desribes, then you can often recover the majority content. If you want you could also use a binary chop to find out which of the actual styles is causing the problem. Now this takes a liitle more time, but you so then recover (almost) the entire cotent.

In this case the problem was to do with three offending styles T155, T159 and T162 which are used to frame three of his formulae. These all have the same problem: They use a style:text-position attribute within a style:text-properties tag to place the text on the line. According to the ODF spec:
  • Use the style:text-position formatting property to specify whether text is positioned above or below the baseline and to specify the relative font height that is used for this text. This attribute can have one or two values.

    The first value must be present and specifies the vertical text position as a percentage that relates to the current font height or it takes one of the values sub or super. Negative percentages or the sub value place the text below the baseline. Positive percentages or the super value place the text above the baseline. If sub or super is specified, the application can choose an appropriate text position.

    The second value is optional and specifies the font height as a percentage that relates to the current font-height. If this value is not specified, an appropriate font height is used. Although this value may change the font height that is displayed, it never changes the current font height that is used for additional calculations.
So a subscript might have style:text-position="-50% 50%". The spec doesn't lay down any limits for the text position, but having had a play around the writer document loader only accepts value for the sub rang <=101% (yes, that extra 1 is bizarre) These three styles have sub values of -223%, -116% and -125%. Setting these to -100% and the doc loads fine.

So what we have is a bug in wrtier. I need to have a bit more of a play and raise this one as an Issue.

BTW Ajut, you will find it a lot easier to do this sort of doc if you use styles rigorously. Set up a text style called eqn, say, where you have no language (and hence you don't get spelling errors for you symbols and default font and emphasis that makes yr equations stand out, and use a keyboard shortcut to set the styles you want.

Re: [Hint] How did I fix my ODT file

Posted: Thu Jan 10, 2008 5:44 pm
by ajut001
First: Thanx TerryE for very quick and professional replay to my post :)

I made recommended replacements ( see below) in original "content.xml" and got my layout back :).

Original document was created with is copy-pasted process from different PPT files in OOo. Only changes
user made, was font styles and -sizes. So it may be a bug in copy-paste layer.
User did not tried to reopen (save->close->open) document during creation process and so she got
error next time, when tried to open saved document.

BR,
Ajut

Replacements:
---
<style:text-properties style:text-position="-283% 100%" -> <style:text-properties style:text-position="-100%"
<style:text-properties style:text-position="-116% 100%" -> <style:text-properties style:text-position="-100%"
<style:text-properties style:text-position="-125% 100%" -> <style:text-properties style:text-position="-100%"
---

Re: [Hint] How did I fix my ODT file

Posted: Thu Jan 10, 2008 8:47 pm
by TerryE
Ajut,

Humm, how do you fancy helping me track down a rather nasty bug in writer. We regularly get people who say "help me, I suddenly can't read my (usually ODT) file", but without hard test cases it is difficult to replicate the bug(s) and therefore impossible to get the developers on the case. If we can identify and eliminate this one then we will help a lot more of users than this one. So now you've added two more bits of information: (a) a possible path by which the corruption occurred, and (b) the inference by your reference to "user" that are probably some form of IT support guy, so we can probably have a deeper conversation on this.

I've been looking at the code for the XML exporter and the XML importer. It seems to be using a standard framework which is generated from the XML DTD with a whole load of stub to do the filling in so that the internal structures can be mapped to XML and visa-versa. The issue is that if the outbound validation is a lot less lax than the inbound (why bother validating the outbound — its valid already, isn't it? Hence you can get into the situation where you can create content which you can save, but not then reload on next load of the document. I suspect that the assumption is that you can't create invalid data because the GUI has the validation that you need, but what if you create the content by pasting in Rich text and thereby bypass the normal GUI?

OK, I'll have a play to see if I can generate a synthetic case, but could you possibly ask a favour and see if your user has the original PPT/DOC that created Section 67 : "Sirge parameetrilised ja kanoonilised võrrandid". If so let me know. If necessary we shoul be able to cut it down to that one slide / page as a test case. I'll mail you my mail ID, if you want to send me any attachments. If you want to pass any private material then you can do it via that email, and we can post a public highlight later.

Re: [Hint] How did I fix my ODT file

Posted: Fri Jan 11, 2008 6:03 am
by TerryE
I managed to use an RTF to create the same effect. See http://qa.openoffice.org/issues/show_bug.cgi?id=76465.

The issue is badly titled but this is the same underlying bug.

Re: [Hint] How did I fix my ODT file

Posted: Tue Jan 22, 2008 11:27 pm
by rodrigo
Great post!!!

It worked perfectly with me!! (the thing about <office:automatic-styles> tags).

I had a big, important ODT file, which was the result of many back and forth editions with MS Word (Office 2003) and Writer (2.3.1). I have tried several things, also from other forums, but nothing worked.

I deleted all in between the <office:automatic-styles> .. </office:automatic-styles>
AND NOW I CAN OPEN THE FILE!!!

thanks a lot, this helped me A LOT!!! :D :D

saludos,
rodrigo

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 12:15 am
by Anodos12
I didn't try this method, but I did get this message:

"Format error discovered in the file in sub-document content.xml at position 2,155278(row,col)."

Can this problem be fixed vis a vis the method described here, or does the broken xml code need repair in some other way?

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 12:20 am
by Anodos12
Crap, didn't read carefully enough: my file is an .odp file, not an .odt file.

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 1:06 am
by acknak
It doesn't matter, all the ODF file formats share the same basic structure, so the same approach can work for any of them.

If you want to attach your file, we can try to help.

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 1:13 am
by Anodos12
Great, yes, thank you, this took hours of work, I greatly appreciate it.

Actually, I can't, the file is 259 KB.

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 2:15 am
by acknak
You can use one of the free file sharing sites, such as filecrunch.com or mediafire.com.

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 3:55 am
by Anodos12

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 5:26 am
by acknak
Try this one: Signs_recovered.odp

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 6:08 am
by Anodos12
Wonderful. If you're ever in Chicago and want a deep dish pizza on me, just shoot me an email. You don't even have to meet me, I'll just have it delivered to your hotel. :P

Re: [Hint] How did I fix my ODT file

Posted: Wed Mar 05, 2008 4:28 pm
by acknak
Just FYI: the document had some invalid character data on slide 92. All I had to do was replace that invalid text with some valid characters and the document could be opened correctly. Of course, I have no idea what the correct text on slide 92 should be, so you'll still have to fix that.

It only took a couple of minutes; simple fix.

Re: [Hint] How did I fix my ODT file

Posted: Tue Aug 12, 2008 10:55 pm
by Bostonaholic
acknak, if I upload an ODS file, do you think you could try and fix it for me? It is a calc file that is VERY important. I've tried the described method and it does not seem to work.

Thanks.

Re: [Hint] How did I fix my ODT file

Posted: Tue Aug 12, 2008 10:56 pm
by Hagar Delest
Just do it, we can try.

Re: [Hint] How did I fix my ODT file

Posted: Tue Aug 12, 2008 11:03 pm
by Bostonaholic

Re: [Hint] How did I fix my ODT file

Posted: Tue Aug 12, 2008 11:46 pm
by Hagar Delest
Well, there are a lot of encoding errors, can't fix them all on my old laptop (takes too much time). You should try yourself with a good XML parser, that's not that difficult.

NB: you can upload files here if they are smaller than 128KB.

Re: [Hint] How did I fix my ODT file

Posted: Wed Aug 13, 2008 2:05 am
by acknak
Ok, here's a recovered file.

This was a strange one, and took quite a bit of work to fix. The "diff" document contains a list of all the things that I changed to try and fix the file. I make no guarantee that there are no more errors. In fact, if you really care about the data, you'll throw this out and go back to your last known-good backup. You do have a backup, don't you? Ok, ok, this is probably salvageable if you don't have a backup, but you'll want to be very careful because there are still errors in your data (there are a few I found but did not change).

Here is a sample of the errors in the file:
diff.png
The light-gray text is context; the darker text is lines that have changes. The "-" lines contain the errors; the "+" lines are the fixed version. The orange-highlights show the errors.

I only fixed the errors in the XML code; I left your data unchanged. Some of the XML errors were enough to prevent the file from loading, some were not, so there's no guarantee that I've found all the problems, but the file does pass "xmllint" and the ODF Validator (http://tools.services.openoffice.org/odfvalidator/), so that's a fairly strong indication that the file structure is ok.

Here are the odd bits:
• first, all the errors are all single-character differences, and the character code of the error is always one less than the correct character code.
• second, the errors are bunched in two small parts (each about 70 lines, or 0.7% ) of the file: lines 3078-3143 and 3278-3385

This does not look like random memory corruption; I have no idea what might cause this pattern of errors.

However, because there are no errors in the XML from the rest of the file, it may be that your data and formulas are ok as well. You just need to carefully check the context lines (light gray) in the diff to see if there are any problems in your content.

There are some: I saw some very suspicious spelling errors, which I highlighted in green in the diff (also visible in the sample image above). If you agree that those are errors, you'll need to fix them in the spreadsheet. Scan the other context lines in the diff document to see if there are any other problems. Remember, the other errors all seem to be one-off, so a change in a number, say from 100 to 000 could be rather hard to see.

And if you have any ideas what circumstances might have triggered this, I'd be interested to hear about it.

Re: [Hint] How did I fix my ODT file

Posted: Wed Aug 13, 2008 4:58 am
by Bostonaholic
Wow, thank you so much acknak!!! You're a savior.

I had been working in the content.xml and found those weird one-off errors everywhere. It was taking me forever to get through all of them and I thought I found them all but it still wouldn't open. I haven't a clue what happened, maybe it was because they used to be xls then I converted them to ods??? But they had been working fine as ods for a while so who knows.

Again, thanks.

Re: [Hint] How did I fix my ODT file

Posted: Fri Jan 09, 2009 12:26 pm
by soti
Dear all,

I have a similar problem. I have a 100 page document which is going to be a book, full of formulas, which I can't open after I saved it regularly with OO writer 3.0.0. I have tried the previous ideas - removed the auto styles from context.xml. I have also tried the ODF validator - which gives me an error:

:Fatal:SAXException:Attribute name "manifest:full-x" associated with an element type "manifest:file-entry" must be followed by the ' = ' character.

It seems to me that somewhere is missing an = character. Does anyone has some ideas on where the error might be?
I have checked the META-INF/manifest.xml, but it's around 1.6 MB so I cannot search through it manually...

Gergely

Re: [Hint] How did I fix my ODT file

Posted: Fri Jan 09, 2009 1:06 pm
by soti
Hello,

I fixed the problem, but I thought that I should share the solution with everyone. I downloaded RXP (an XML parser) available for both Windows and Unix. With it I scanned all the xml files I got by unzipping the original ODT document. Somewhere in the file META-INF/manifest.xml was an error - like some bytes were changed to other, it looked like:
manifest:full-x@g@+@t@ject 615/Configurations2/progressbar/

instead of:

manifest:full-path="Object 615/Configurations2/progressbar/

So i just changed it back and it worked fine. I don't know what could have caused the problem - I use a brand new computer, the newest openSuse, the hard disk didn't cause me problems before.

regards,
Gergely

Re: [Hint] How did I fix my ODT file

Posted: Fri Jan 09, 2009 6:19 pm
by acknak
Nice work! Thanks for the information.

Re: [Hint] How did I fix my ODT file

Posted: Tue Feb 03, 2009 8:29 am
by goatsxc
I have two Calc files with the similar problems ('format error discovered in the sub-document context.xml at 2,19926871(row,col)). I saved them earlier this afternoon, when they were working fine. Currently I can't open either Calc document and I've tried the steps in the initial post (I'm having trouble rezipping the files - I get some sort of error with 7zip). I was hoping I someone in the community could take a look at the Calc files and see if they are fixable (important data I'm obviously hoping to recover):

http://www.mediafire.com/?sharekey=40fa ... f6e8ebb871

Thanks in advance!

Re: [Hint] How did I fix my ODT file

Posted: Tue Feb 03, 2009 3:23 pm
by goatsxc
i think i correctly unzipped, edited the xml file, and then rezipped things. when i try to open the new ods file, however i get this error message:

format error discovered in the sub-document $(ARG1) at $(ARG2)(row,col)

there doesn't seem to be much info about it on the forums or when i search google. any ideas?

Re: [Hint] How did I fix my ODT file

Posted: Thu May 27, 2010 2:05 pm
by pamindic
Ajut001's instructions to remove the automatic styles section from contents.xml worked for me to recover my corrupt .ods file.
Much appreciated.

Re: [Hint] How did I fix my ODT file

Posted: Tue Nov 23, 2010 9:48 pm
by anonymouschick
Hey guys, I had the same problem this morning and I'm still working on it, I have just one question about the way you fixed this, did you put those on a terminal? I'm sorry if I'm too new at this, I'm just desperate to get back my file and as far as I've looked this seems to be the best way to get it done, except that I might be confusing something because I tried using a terminal, but I guess I'm far beyond wrong (I'm on Linux).
Please if you can, help me

Re: [Hint] How did I fix my ODT file

Posted: Tue Nov 23, 2010 9:53 pm
by Hagar Delest
No terminal needed.

Re: [Hint] How did I fix my ODT file

Posted: Thu Dec 09, 2010 1:47 am
by garbledtext
Hello all,

I am a bit of a computer layman, so please be patient. After replacing a .png image in my .odt document and then deleting the original in from its folder OOo promptly froze then crashed and now upon opening it I'm getting the error "format error discovered in the file in sub-document context.xml" except mine ends with "at 1,0(row,col)". Found some people with a similar error and they all seem to direct you to here.

To start off, I found the OP's command-line style advice to be a little cryptic. I gleaned from this advice that I was supposed to zip, then unzip the file, and upon unzipping the folder would contain content.xml. Initially I used XP's built zip utility and zipped it that way. When I unzipped it I simply got back the original file: "fixme.odt". Hmmm... I don't know why I expected something different to happen...

I figured I must be doing something wrong (possible stemming from my lack of understanding of what XP's built in zip utility is capable of) so I downloaded WinZip. I read on a different forum that the .odt file is actually a compressed file already, so I simply opened the .odt file in winzip and extracted straight from there. Voila, I get all the appropriate files, including content.xml.

The problem I'm having now is that when I try to open content.xml in Notepad or Wordpad I get crazy garbled characters with those squares and all those funny foreign currency symbols. I take this to be a very bad sign. I read online that this type of text means that the file contains binary information rather than text, but I don't know how valid that is or if that information is at all helpful. I don't care about the formatting, I simply want the content back. Is there any way to get it or am I screwed?