PDF to odt

jhorned · Post by **jhorned** » Tue Sep 01, 2015 9:09 pm

I have some documents that was sent to me and they are pdf. When I try to work on the document it won't let me. I found a place that said open with and, open office is not included in the options. How do I convert this pdf document to a odt ? Thanks Joseph

Post by **RoryOF** » Tue Sep 01, 2015 9:13 pm

PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.

jhorned · Post by **jhorned** » Tue Sep 01, 2015 9:24 pm

Thank you for your help. I will see what I can find. Joseph

Post by **Zizi64** » Tue Sep 01, 2015 9:32 pm

There is an extension named: Pdf import. It work in the Draw application. If you install it, then you can open the pdf files, and you can edit (partially!) the textual type pdf-s with the Draw application. (The LibreOffice contains this extension by default)

And the LO can ebbedding an ODF text document into the exported PDF file: You can read this hybrid file with pdf reader softwares (as the pdf files you can usually), and you can 'unpack' and edit it with the Writer application.

jhorned · Post by **jhorned** » Tue Sep 01, 2015 9:40 pm

Thanks I will try that. Joseph

esperantisto · Post by **esperantisto** » Fri Sep 04, 2015 9:05 am

RoryOF wrote:…pass them through an OCR program…

ABBYY FineReader 11 and later can save OCR results to ODT, however, the document structure is a bit weird.

Post by **RoryOF** » Fri Sep 04, 2015 9:21 am

The output of most OCR programs is, as esperantisto says, "a bit weird". This is because they often default to attempting to recreate the exact layout of the original and may be putting text in little frames, rather than outputting it as an unformatted stream. To get the text out of the little frames into an unformatted stream can be quite a challenge.

Using linux there are quite powerful OCR tools which do not suffer from the above problem (if set correctly). However, one point to make: no matter what OCR application one uses there are errors. Typically, depending on the quality and typeface of the original, the accuracy with which it was laid on the scanner platen (twist or skew of the page), shading due to the gutter of the binding (if a book), one might get an accuracy of 98%. So in an 80,000 word book there are (roughly) 450,000 characters; of these about 9000 will be inaccurately recognised. Spellcheck on the result may pick up some of these errors, but definitely not all. A common error is confusion of commas and full stops. Words like minimum often cause recognition problems.

So be prepared to proofread carefully. If one is scanning tables of figures they are fraught with difficulty and must be proofread again and again to avoid error.

jhorned · Post by **jhorned** » Fri Sep 04, 2015 2:18 pm

Thank You

jhorned · Post by **jhorned** » Fri Apr 08, 2016 2:33 pm

Thanks

Post by **RoryOF** » Thu Apr 21, 2016 9:44 am

It is worth noting that Calibre, the cross platform e-book reader/library program, can often convert a PDF file into plain text, if the authors of the PDF file have not set options to prevent such conversion.

Any such converted PDF file may (read: WILL) have errors of recognition and it will be your responsibility to proof-read and correct the output before making any serious use of it. In particular any numeric information needs to be very carefully scrutinised. Problems with text recognition are often easily corrected from context, but accurate rendering of punctuation can be a problem; this is particularly important in legal documents.

saam123 · Post by **saam123** » Tue Sep 27, 2016 2:33 am

[color=#000000]RoryOF[/color] wrote:PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.

Thanks for your suggestion. would you please guide me is there any third party plugin or API available that do the same thing because it have bulk amount of PDF documents to be converted in odt.

Post by **RoryOF** » Tue Sep 27, 2016 8:15 pm

I can't answer for Windows versions, but for linux there are several free and/or opensource pdf to odt conversion utilities.

PDF to odt

PDF to odt

Re: pdf to odt

Re: pdf to odt

Re: pdf to odt

Re: pdf to odt

Re: pdf to odt

Re: PDF to odt

Re: PDF to odt

Re: PDF to odt

Re: PDF to odt

Re: pdf to odt

Re: PDF to odt