[Tutorial] How do I view or edit a PDF file with OpenOffice?

Forum rules
No question in this section please
For any question related to a topic, create a new thread in the relevant section.
Post Reply
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

[Tutorial] How do I view or edit a PDF file with OpenOffice?

Post by John_Ha »

You cannot view or edit a PDF file with AOO or LO alone.

View a PDF

It is best to view PDFs with a PDF Viewer like Adobe Reader or Foxit Reader. OpenOffice alone cannot view PDFs and viewing with the Oracle PDF Import Extension below is not as good as with a proper PDF viewer.

Edit a PDF

If you want to do minor, cosmetic edits to a PDF using OpenOffice Draw (not Writer), you can install Oracle PDF Import Extension (for OpenOffice 3.x). The extension is shipped with LO. Go File > Open ..., and navigate to the PDF file.

If you want to do major edits to a PDF buy Adobe Acrobat. I do not think there is any alternative product which comes even close to what Adobe Acrobat can do.

You can obtain free PDF editors like PDFEscape but they all have far less function than Adobe Acrobat. See List of PDF software.

A (poor) workaround is to use a File Conversion site like http://www.zamzar.com to convert the PDF to an editable format like .odt, .doc or .docx. Edit the .odt/.doc/.docx file and create a new PDF. You are likely to have to do lots of cleaning up.

PDF Split and Merge (PDFsam) is an excellent utility which allows you to make changes to a PDF at the page level by splitting PDFs into individual pages, merging multiple PDFs etc.

I have never found the Oracle PDF Import Extension extension to be that useful for me because it works in two ways:

1 The PDF is a "normal PDF" - as in the vast majority of cases

In this case, when you open the PDF with AOO (File > Open > navigate to the PDF), the PDF file is opened in Draw, not Writer.

The text on the Draw page is split into many single line paragraphs and is all but impossible to edit it satisfactorily for other than the most minor edits like changing a date. There is no concept of a paragraph, nor of text flowing to the next line and sometimes even the same line is split into different fields making alignment tricky.

After editing you save the file as a PDF.

2 The PDF is a "hybrid PDF" which contains a .odt file embedded in the PDF alongside the normal PDF content - this is VERY unusual!

Hybrid PDFs are extremely unusual ...

... but if you are fortunate enough to have such a PDF, when you open the PDF with AOO (File > Open > navigate to the PDF), the extension opens the .odt file embedded in the PDF using Writer and, not surprisingly, you can edit the .odt file with Writer.

Both AOO and LO can create hybrid PDFs. With a document open go File > Export as PDF > General ..., and tick Embed this document inside the PDF.

The add-on description says ...
Best results with 100% layout accuracy can be achieved with the "PDF/ODF hybrid file" format, which this extension also enables. A hybrid PDF/ODF file is a PDF file that contains an embedded ODF source file as well as the normal PDF content. Hybrid PDF/ODF files will be opened in OpenOffice Writer as an ODF file without any layout changes.

The PDF Import Extension also allows you to import and modify PDF documents for non hybrid PDF/ODF files. PDF documents are imported in Draw to preserve the layout and to allow basic editing. This is the perfect solution for changing dates, numbers or small portions of text with a minimum loss of formatting information for simple formatted documents.

Documents with more sophisticated layouts, such as those created with professional Desktop Publishing applications that use special fonts and complex vector graphics are not suitable for the PDF Import Extension.
Copying text and images from a PDF open in Adobe Reader - if the contents are copyable.

You can easily copy text from a PDF by highlighting it and Ctrl+C ..., or Edit > Copy. Each individual line in a PDF is stored as a separate paragraph so you get lots of single lines, each ending with an End of paragraph marker (¶), so you will have a lot of End of paragraph markers to delete. See [Tutorial] How do I remove end_of_paragraph marks? for instruction on how to remove them - the OOo FBTools add on is excellent for removing them and is highly recommended.

You can copy an image in a PDF by left-clicking the image > right-click the highlighted area > Copy image.

Adobe Reader's Take a snapshot is very useful. Edit > Take a snapshot. Now click the top-left corner of what you want to copy and drag the mouse to the bottom-right corner and release ..., and the highlighted area is copied to the clipboard. Paste it into a graphics editor and save the image. You can zoom the PDF so the copied area is larger than the screen - just drag so that the screen scrolls.

Note that some PDFs are set by the author so that copying text and/or images is prohibited. Similarly, some images are broken into many small parts to make copying difficult. In some PDFs the text is actually an image.

Converting a PDF to a .doc file

Several web sites offer to convert PDF files to various formats. One of the best is Convert PDF to WORD which converts a PDF file to a .doc file while ensuring paragraph flow and much of the formatting.

Hybrid PDF / AOO (or LO) files.

Thanks to Villeroy for providing a PDF hybrid file:
Villeroy wrote:This PDF file is an ODT/PDF hybrid created from the English .odt file of the immensely helpful Writer tutorial.

You can open the odt component with OpenOffice or with LibreOffice and edit the .odt file. The PDF component can be viewed with any PDF viewer.

The same trick works with spreadsheets, presentations and drawings if you have OpenOffice and the PDF extension; or if you have LibreOffice, where the extension is shipped with LO.
Adobe's PDF standard

PDF files can easily be fully edited with Adobe Acrobat but, while many applications can write PDF files, very few can edit PDF files and none with the capability of Adobe Acrobat.

This may change in future - see https://en.wikipedia.org/wiki/PDF which says:
Adobe Systems made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in desktop publishing workflows, and competed with a variety of formats such as DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript format.

PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008, at which time control of the specification passed to an ISO Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for all patents owned by Adobe that are necessary to make, use, sell, and distribute PDF compliant implementations.

PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized and their specification is published only on Adobe’s website. Many of them are also not supported by popular third-party implementations of PDF.

On July 28, 2017, ISO 32000-2:2017 (PDF 2.0) was published. ISO 32000-2 does not include any proprietary technologies as normative references.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply