PDF to odt

Talk about anything at all....
Post Reply
jhorned
Posts: 7
Joined: Sat May 26, 2012 6:59 pm

PDF to odt

Post by jhorned »

I have some documents that was sent to me and they are pdf. When I try to work on the document it won't let me. I found a place that said open with and, open office is not included in the options. How do I convert this pdf document to a odt ? Thanks Joseph
Open Office 3/3 installed on Vista
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: pdf to odt

Post by RoryOF »

PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
jhorned
Posts: 7
Joined: Sat May 26, 2012 6:59 pm

Re: pdf to odt

Post by jhorned »

Thank you for your help. I will see what I can find. Joseph
Open Office 3/3 installed on Vista
User avatar
Zizi64
Volunteer
Posts: 11363
Joined: Wed May 26, 2010 7:55 am
Location: Budapest, Hungary

Re: pdf to odt

Post by Zizi64 »

There is an extension named: Pdf import. It work in the Draw application. If you install it, then you can open the pdf files, and you can edit (partially!) the textual type pdf-s with the Draw application. (The LibreOffice contains this extension by default)

And the LO can ebbedding an ODF text document into the exported PDF file: You can read this hybrid file with pdf reader softwares (as the pdf files you can usually), and you can 'unpack' and edit it with the Writer application.
Tibor Kovacs, Hungary; LO7.5.8 /Win7-10 x64Prof.
PortableApps/winPenPack: LO3.3.0-7.6.2;AOO4.1.14
Please, edit the initial post in the topic: add the word [Solved] at the beginning of the subject line - if your problem has been solved.
jhorned
Posts: 7
Joined: Sat May 26, 2012 6:59 pm

Re: pdf to odt

Post by jhorned »

Thanks I will try that. Joseph
Open Office 3/3 installed on Vista
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: pdf to odt

Post by esperantisto »

RoryOF wrote:…pass them through an OCR program…
ABBYY FineReader 11 and later can save OCR results to ODT, however, the document structure is a bit weird.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: PDF to odt

Post by RoryOF »

The output of most OCR programs is, as esperantisto says, "a bit weird". This is because they often default to attempting to recreate the exact layout of the original and may be putting text in little frames, rather than outputting it as an unformatted stream. To get the text out of the little frames into an unformatted stream can be quite a challenge.

Using linux there are quite powerful OCR tools which do not suffer from the above problem (if set correctly). However, one point to make: no matter what OCR application one uses there are errors. Typically, depending on the quality and typeface of the original, the accuracy with which it was laid on the scanner platen (twist or skew of the page), shading due to the gutter of the binding (if a book), one might get an accuracy of 98%. So in an 80,000 word book there are (roughly) 450,000 characters; of these about 9000 will be inaccurately recognised. Spellcheck on the result may pick up some of these errors, but definitely not all. A common error is confusion of commas and full stops. Words like minimum often cause recognition problems.

So be prepared to proofread carefully. If one is scanning tables of figures they are fraught with difficulty and must be proofread again and again to avoid error.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
jhorned
Posts: 7
Joined: Sat May 26, 2012 6:59 pm

Re: PDF to odt

Post by jhorned »

Thank You
Open Office 3/3 installed on Vista
jhorned
Posts: 7
Joined: Sat May 26, 2012 6:59 pm

Re: PDF to odt

Post by jhorned »

Thanks
Open Office 3/3 installed on Vista
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: PDF to odt

Post by RoryOF »

It is worth noting that Calibre, the cross platform e-book reader/library program, can often convert a PDF file into plain text, if the authors of the PDF file have not set options to prevent such conversion.

Any such converted PDF file may (read: WILL) have errors of recognition and it will be your responsibility to proof-read and correct the output before making any serious use of it. In particular any numeric information needs to be very carefully scrutinised. Problems with text recognition are often easily corrected from context, but accurate rendering of punctuation can be a problem; this is particularly important in legal documents.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
saam123
Posts: 2
Joined: Tue Sep 27, 2016 2:29 am

Re: pdf to odt

Post by saam123 »

[color=#000000]RoryOF[/color] wrote:PDF files were intended to be unchangeable. To work on them, by way of a major edit, you must pass them through an OCR program (Optical Character Recognition) which converts them to editable text. Alternately, you can use an application such as Adobe Acrobat (commercial) to edit them and leave them as PDF files. There may be downloadable programs similar to Adobe Acrobat, but I don't know of them.
Thanks for your suggestion. would you please guide me is there any third party plugin or API available that do the same thing because it have bulk amount of PDF documents to be converted in odt.
Last edited by saam123 on Wed Sep 28, 2016 1:19 am, edited 1 time in total.
OpenOffice 3.1 on Windows Vista
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: PDF to odt

Post by RoryOF »

I can't answer for Windows versions, but for linux there are several free and/or opensource pdf to odt conversion utilities.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Post Reply