Page 1 of 1

[Solved] PDF to text conversion

PostPosted: Tue Feb 26, 2019 10:48 pm
by RosaliyLynne
does Open Office have a way to convert a PDF file back into either text or open office format?

Re: pdf to text conversion

PostPosted: Tue Feb 26, 2019 10:55 pm
by RoryOF
No. Frequently one can select text in a PDF and Copy/Paste it into an .odt file; if it is a very large file, it is often better to pass the PDF through an OCR (Optical Character Recognition) application, which will translate the PDF into editable text. Some or much reformatting and correction may be required.

Re: pdf to text conversion

PostPosted: Wed Feb 27, 2019 12:26 am
by John_Ha
See [Tutorial] How do I view or edit a PDF file with OpenOffice?

You will find much useful information in the User Guides, the Writer, Base and Calc Tutorials and the AOO Frequently Asked Questions. May I suggest you bookmark the pages.

Re: pdf to text conversion

PostPosted: Wed Feb 27, 2019 9:05 am
by Zizi64
The LibreOffice can save into "hybrid PDF format". The result file contains the ODF an PDF version of the file in same time. You can view and print it with a PDF reader software, but you can edit it with the LibreOffice.

I know it: it is not a solution for an existing PDF file, that contain the text as a picture or as text labels... Note:
the original PDF fpormat was not developed for re-editing.

Re: pdf to text conversion

PostPosted: Wed Feb 27, 2019 11:49 am
by John_Ha
Zizi64 wrote:The LibreOffice can save into "hybrid PDF format".

So can AOO.

Zizi64 wrote:Note: the original PDF format was not developed for re-editing.

PDF stands for Portable Document Format and was designed by Adobe for ease of reading on any system.

See the tutorial. PDF files can easily be fully edited with Adobe Acrobat. I think that the PDF format must be protected by patents or similar because, while many applications can write PDF files, very few exist which can edit PDF files.

 Edit: This may soon change - see which says:

Adobe Systems made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in desktop publishing workflows, and competed with a variety of formats such as DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript format.

PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008, at which time control of the specification passed to an ISO Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for all patents owned by Adobe that are necessary to make, use, sell, and distribute PDF compliant implementations.

PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized and their specification is published only on Adobe’s website. Many of them are also not supported by popular third-party implementations of PDF.

On July 28, 2017, ISO 32000-2:2017 (PDF 2.0) was published. ISO 32000-2 does not include any proprietary technologies as normative references.

Re: pdf to text conversion

PostPosted: Thu Feb 28, 2019 2:30 am
by RosaliyLynne
thank you all for your responses. Adobe actually has such a program but it requires a subscription service and the program was taking so long to install that I contacted Adobe and cancelled that attempt. I ended up manually inputting the 2-page document (a resume - for a friend) and saving it in Office format before converting to pdf. This way it can be edited again in future. The original non-prf was on her work computer BUT eliminated her position and she did not back it up to a flash drive. All is well though. Again - thank you all.