[Dropped] Convert PDF to ODS
[Dropped] Convert PDF to ODS
I have received a PDF file that was derived from either a database or spreadsheet file. I would like to turn it into an Open Office spreadsheet. Is there a way to do this? Thanks.
Last edited by MrProgrammer on Tue Dec 10, 2024 6:37 pm, edited 1 time in total.
Reason: Dropped: Suggestions provided but no response from dbuster
Reason: Dropped: Suggestions provided but no response from dbuster
dbuster, Open Office 4.1.14, Windows 11
- Hagar Delest
- Moderator
- Posts: 33629
- Joined: Sun Oct 07, 2007 9:07 pm
- Location: France
Re: Convert PDF to ODS
Hi and welcome to the forum!
Try a mere copy and paste, depending on how the PDF was made, the table structure may have been kept.
Printing the PDF as an image and then using an OCR application may give you some result.
Note: you can try with LibreOffice (at least a portable version), it may do better for the copy paste.
But that's much trouble, redoing the whole file may be quicker. Especially if copy-paste at least gives you series as rows or columns.
Try a mere copy and paste, depending on how the PDF was made, the table structure may have been kept.
Printing the PDF as an image and then using an OCR application may give you some result.
Note: you can try with LibreOffice (at least a portable version), it may do better for the copy paste.
But that's much trouble, redoing the whole file may be quicker. Especially if copy-paste at least gives you series as rows or columns.
LibreOffice 25.2 on Linux Mint Debian Edition (LMDE 7 Gigi) and 25.2 portable on Windows 11.
Re: Convert PDF to ODS
Too many detailed problems arise to believe that PDF transformation produces reliable data.
One of them is custom font encoding. There is no evidence that characters in PDF document correspond to usual entry points of usual text encoding; let's say, unicode.
One of others is possibility of specifying translated location of objects.
However, in *simple* situations, extracting text from PDF, and/or Copy→Paste operations may go smooth.
One of them is custom font encoding. There is no evidence that characters in PDF document correspond to usual entry points of usual text encoding; let's say, unicode.
One of others is possibility of specifying translated location of objects.
However, in *simple* situations, extracting text from PDF, and/or Copy→Paste operations may go smooth.
JJ ∙ https://forum.openoffice.org/pl/
LO (26.2) ∙ Python (3.13|3.10) ∙ Unicode 17 ∙ LᴬTEX 2ε ∙ XML ∙ Unix tools ∙ Linux (Rocky|CentOS)
LO (26.2) ∙ Python (3.13|3.10) ∙ Unicode 17 ∙ LᴬTEX 2ε ∙ XML ∙ Unix tools ∙ Linux (Rocky|CentOS)
Re: Convert PDF to ODS
There is a linux utility pdftotext that may be of use, but be aware it does not always (for reasons I do not know) produce strictly linear text output. In the case of PDF book text, one can often find displaced chunks of text in the output file.
Apache OpenOffice 4.1.16 on Xubuntu 24.04.4 LTS