PDF to odt
PDF to odt
I have some documents that was sent to me and they are pdf. When I try to work on the document it won't let me. I found a place that said open with and, open office is not included in the options. How do I convert this pdf document to a odt ? Thanks Joseph
Open Office 3/3 installed on Vista
Re: pdf to odt
PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: pdf to odt
Thank you for your help. I will see what I can find. Joseph
Open Office 3/3 installed on Vista
Re: pdf to odt
There is an extension named: Pdf import. It work in the Draw application. If you install it, then you can open the pdf files, and you can edit (partially!) the textual type pdf-s with the Draw application. (The LibreOffice contains this extension by default)
And the LO can ebbedding an ODF text document into the exported PDF file: You can read this hybrid file with pdf reader softwares (as the pdf files you can usually), and you can 'unpack' and edit it with the Writer application.
And the LO can ebbedding an ODF text document into the exported PDF file: You can read this hybrid file with pdf reader softwares (as the pdf files you can usually), and you can 'unpack' and edit it with the Writer application.
Tibor Kovacs, Hungary; LO7.5.8 /Win7-10 x64Prof.
PortableApps/winPenPack: LO3.3.0-7.6.2;AOO4.1.14
Please, edit the initial post in the topic: add the word [Solved] at the beginning of the subject line - if your problem has been solved.
PortableApps/winPenPack: LO3.3.0-7.6.2;AOO4.1.14
Please, edit the initial post in the topic: add the word [Solved] at the beginning of the subject line - if your problem has been solved.
-
- Volunteer
- Posts: 578
- Joined: Mon Oct 08, 2007 1:31 am
Re: pdf to odt
ABBYY FineReader 11 and later can save OCR results to ODT, however, the document structure is a bit weird.RoryOF wrote:…pass them through an OCR program…
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
Re: PDF to odt
The output of most OCR programs is, as esperantisto says, "a bit weird". This is because they often default to attempting to recreate the exact layout of the original and may be putting text in little frames, rather than outputting it as an unformatted stream. To get the text out of the little frames into an unformatted stream can be quite a challenge.
Using linux there are quite powerful OCR tools which do not suffer from the above problem (if set correctly). However, one point to make: no matter what OCR application one uses there are errors. Typically, depending on the quality and typeface of the original, the accuracy with which it was laid on the scanner platen (twist or skew of the page), shading due to the gutter of the binding (if a book), one might get an accuracy of 98%. So in an 80,000 word book there are (roughly) 450,000 characters; of these about 9000 will be inaccurately recognised. Spellcheck on the result may pick up some of these errors, but definitely not all. A common error is confusion of commas and full stops. Words like minimum often cause recognition problems.
So be prepared to proofread carefully. If one is scanning tables of figures they are fraught with difficulty and must be proofread again and again to avoid error.
Using linux there are quite powerful OCR tools which do not suffer from the above problem (if set correctly). However, one point to make: no matter what OCR application one uses there are errors. Typically, depending on the quality and typeface of the original, the accuracy with which it was laid on the scanner platen (twist or skew of the page), shading due to the gutter of the binding (if a book), one might get an accuracy of 98%. So in an 80,000 word book there are (roughly) 450,000 characters; of these about 9000 will be inaccurately recognised. Spellcheck on the result may pick up some of these errors, but definitely not all. A common error is confusion of commas and full stops. Words like minimum often cause recognition problems.
So be prepared to proofread carefully. If one is scanning tables of figures they are fraught with difficulty and must be proofread again and again to avoid error.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: PDF to odt
It is worth noting that Calibre, the cross platform e-book reader/library program, can often convert a PDF file into plain text, if the authors of the PDF file have not set options to prevent such conversion.
Any such converted PDF file may (read: WILL) have errors of recognition and it will be your responsibility to proof-read and correct the output before making any serious use of it. In particular any numeric information needs to be very carefully scrutinised. Problems with text recognition are often easily corrected from context, but accurate rendering of punctuation can be a problem; this is particularly important in legal documents.
Any such converted PDF file may (read: WILL) have errors of recognition and it will be your responsibility to proof-read and correct the output before making any serious use of it. In particular any numeric information needs to be very carefully scrutinised. Problems with text recognition are often easily corrected from context, but accurate rendering of punctuation can be a problem; this is particularly important in legal documents.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: pdf to odt
Thanks for your suggestion. would you please guide me is there any third party plugin or API available that do the same thing because it have bulk amount of PDF documents to be converted in odt.[color=#000000]RoryOF[/color] wrote:PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.
Last edited by saam123 on Wed Sep 28, 2016 1:19 am, edited 1 time in total.
OpenOffice 3.1 on Windows Vista
Re: PDF to odt
I can't answer for Windows versions, but for linux there are several free and/or opensource pdf to odt conversion utilities.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS