[Solved] ASCII garbage in non-text areas

Discuss the word processor
Post Reply
joelstern
Posts: 12
Joined: Sun Jan 13, 2008 5:33 pm

[Solved] ASCII garbage in non-text areas

Post by joelstern »

I am fairly new to OpenOffice.
I have a .doc file that was converted from a .PDF file. It reads fine in Office 2003 but when I try to read it with Writer I get ASCII garbage in non-text areas (pictures etc.).

Any help would be appreciated.
Last edited by joelstern on Tue Jan 15, 2008 12:33 am, edited 1 time in total.
hol.sten
Volunteer
Posts: 495
Joined: Mon Oct 08, 2007 1:31 am
Location: Hamburg, Germany

Re: ASCII garbage in non-text areas

Post by hol.sten »

joelstern wrote:I am fairly new to OpenOffice.
With which OOo version on which operating system?
joelstern wrote:I have a .doc file that was converted from a .PDF file.
With which tool? Can you provide an example?
OOo 3.2.0 on Ubuntu 10.04 • OOo 3.2.1 on Windows 7 64-bit and MS Windows XP
joelstern
Posts: 12
Joined: Sun Jan 13, 2008 5:33 pm

Re: ASCII garbage in non-text areas

Post by joelstern »

I have OpenOffice 2.3.0 on XP Professional. I have the copys of the pdf and the doc files. the file is too large to attach
User avatar
foxcole
Volunteer
Posts: 1507
Joined: Mon Oct 08, 2007 1:31 am
Location: Minneapolis, Minnesota

Re: ASCII garbage in non-text areas

Post by foxcole »

joelstern wrote:I have OpenOffice 2.3.0 on XP Professional. I have the copys of the pdf and the doc files. the file is too large to attach
Please read the Survival Guide (see the link in my sig line below).

A .doc file converted from a .pdf file will always have extraneous and erroneous data in it. There's no tool yet that can make that conversion cleanly, not even Acrobat itself. It is a bit surprising that Word would hide that data, or perhaps is simply unable to display it.

There's no reason to upload the entire .doc file. Please just find a page that displays the problem, save it as a separate file, make sure it still displays the problems you're seeing, then upload that as a sample.
Cheers!
---Fox

OOo 3.2.0 Portable, Windows 7 Home Premium 64-bit
joelstern
Posts: 12
Joined: Sun Jan 13, 2008 5:33 pm

Re: ASCII garbage in non-text areas

Post by joelstern »

I could not select the problem areas from the doc files directly. I've moved the doc document througe another conversion and produced the attached.
Attachments
blazeware sample1.odt
(16.69 KiB) Downloaded 270 times
User avatar
foxcole
Volunteer
Posts: 1507
Joined: Mon Oct 08, 2007 1:31 am
Location: Minneapolis, Minnesota

Re: ASCII garbage in non-text areas

Post by foxcole »

joelstern wrote:I could not select the problem areas from the doc files directly.
Perhaps I should have said, "delete all the other pages." Would that have made a difference?
joelstern wrote: I've moved the doc document througe another conversion and produced the attached.
Thank you, but I'm a little confused. I thought you were working with .doc files in Writer. This one's an .odt Writer file, so it would be better to be able to see a .doc example that displays correctly in Word but not in Writer.

Also, you haven't yet answered hol.sten's question: What did you use to convert the PDF file? That could provide a clue or two as to what's going on with the file.

All I can tell you based on the attachment is that it appears the image data has been changed or removed so the program can't recognize or use the remaining data as an image. File signatures for JPG start out with FF D8 FF but the next bit should be either FE 00, or E1, according to my sources. The next bit in the file you have is E0 00, so I'm not sure that's a legitimate jpg code, but I'm also not sure how Word could display those images correctly if it isn't. I'll have to dig around some more and see if I can find it online. Maybe it's a Microsoft jpg format.

EDIT: Well, it is a valid header, for JPEGs in JFIF compliant format.
Cheers!
---Fox

OOo 3.2.0 Portable, Windows 7 Home Premium 64-bit
joelstern
Posts: 12
Joined: Sun Jan 13, 2008 5:33 pm

Re: ASCII garbage in non-text areas

Post by joelstern »

I started with a seven page pdf. I emailed it to my son who has a full version of Adobe. He converted it to a doc file and ran it successfully with Office the emailed it to me. I could not read it and sent it back to him. He loaded my attachment and again read it successfuly.

I can't select any part of my copy of the .doc file with programs I have so I converted it to a txt file, cut and pasted the first few paragraphs, then saved it with Writer so I could show you something.
User avatar
foxcole
Volunteer
Posts: 1507
Joined: Mon Oct 08, 2007 1:31 am
Location: Minneapolis, Minnesota

Re: ASCII garbage in non-text areas

Post by foxcole »

joelstern wrote:I started with a seven page pdf. I emailed it to my son who has a full version of Adobe. He converted it to a doc file and ran it successfully with Office the emailed it to me. I could not read it and sent it back to him. He loaded my attachment and again read it successfuly.

I can't select any part of my copy of the .doc file with programs I have so I converted it to a txt file, cut and pasted the first few paragraphs, then saved it with Writer so I could show you something.
Oh, I see. Thank you!
Txt files can't work with images, so there's another extra layer that could affect what we're seeing in the Writer file.

I'd be happy to work with you privately if you wish. I have Acrobat 7.0 and Word 2003, so maybe we can re-create the file conversion to .doc and see what happens... and hopefully find a way to get you a file you can work with. Please PM me if you're interested in pursuing that route.
Cheers!
---Fox

OOo 3.2.0 Portable, Windows 7 Home Premium 64-bit
Post Reply