[Solved] Getting Pdf tables in excel

Discuss the spreadsheet application
Post Reply
User avatar
Tech_Sau
Banned
Posts: 6
Joined: Wed Jun 06, 2018 12:50 pm

[Solved] Getting Pdf tables in excel

Post by Tech_Sau »

Hi Guys,

Thanks for letting me join the forum and please forgive for my misunderstood english as I am not very good in it.
I have this Pdf received by me which contains lot of table. Now I need it in excel to further process it. I have converted it into excel but I didn't get accurate result. Some of the elements of different rows or columns have been combined.
Since table contains thousands of data it is difficult to sort it manually. Can anyone please help me how to do so?

I even don't know in which category my query will be valid, but if it doesn't go here, I request admin to throw it in the right one.

Thanks in Advance, if anyone helps.
Last edited by Tech_Sau on Wed Jun 20, 2018 2:15 pm, edited 1 time in total.
User avatar
RoryOF
Moderator
Posts: 34570
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Getting Pdf tables in excel

Post by RoryOF »

Initially, I inform you that this Forum is intended for OpenOffice, the spreadsheet component of which is Calc. While largely functionally equivalent to Excel, Calc is not an Excel clone, so there can be differences in behaviour,

PDF files are not intended to be edited, so your task may be impossible. I am not a Calc expert, so cannot help further.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
FJCC
Moderator
Posts: 9231
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Getting Pdf tables in excel

Post by FJCC »

If you can post the file, or some rows from it, and indicate where the problem is, we might be able to suggest a way to fix it. However, Rory's point that PDF is not intended to be edited is very important and it may not be possible to fix the file.
To upload a file, click Post Reply and look for the Upload Attachment tab below the box where you type a response.
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Getting Pdf tables in excel

Post by jrkrideau »

You probably need some custom software that is especially designed to convert PDF table into a format that can be used in a spreadsheet.

I would suggest having a look at Tabula, a java application that is specifically designed to do this.

I have only used this on 1 or 2 simple tables but so far the results have been good.
 Edit: Oops forgot. Here is a link to a handy tutorial on Tabula
https://computers.tutsplus.com/tutorial ... -cms-20843 
LibreOffice 7.3.7. 2; Ubuntu 22.04
User avatar
Tech_Sau
Banned
Posts: 6
Joined: Wed Jun 06, 2018 12:50 pm

Re: Getting Pdf tables in excel

Post by Tech_Sau »

jrkrideau wrote:You probably need some custom software that is especially designed to convert PDF table into a format that can be used in a spreadsheet.

I would suggest having a look at Tabula, a java application that is specifically designed to do this.

I have only used this on 1 or 2 simple tables but so far the results have been good.
 Edit: Oops forgot. Here is a link to a handy tutorial on Tabula
https://computers.tutsplus.com/tutorial ... -cms-20843 
Thanks for the reply, I would try doing so with Tabula.
Also thanks a lot for the tutorial link for the same as well, I hope it would help me solve my problem better. :super:
[url=http://www.sntechsolutions.in]Nancy- Solution Provider[/url]
User avatar
Tech_Sau
Banned
Posts: 6
Joined: Wed Jun 06, 2018 12:50 pm

Re: Getting Pdf tables in excel

Post by Tech_Sau »

FJCC wrote:If you can post the file, or some rows from it, and indicate where the problem is, we might be able to suggest a way to fix it. However, Rory's point that PDF is not intended to be edited is very important and it may not be possible to fix the file.
To upload a file, click Post Reply and look for the Upload Attachment tab below the box where you type a response.
I know that the pdfs files can not be edited. But I have converted it into the spreadsheet format using online tool and the result I found was that the original file contained of more than 20 columns, but the file so converted into spreadsheet has only 11 columns.
When I checked for it I found that some columns have been merged into 1. Also since these have about thousands of data, its very difficult to short them manually.
Now I need help in sorting them into different columns.
These data are highly confidential so I am not allowed to share them as well.
[url=http://www.sntechsolutions.in]Nancy- Solution Provider[/url]
FJCC
Moderator
Posts: 9231
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Getting Pdf tables in excel

Post by FJCC »

Calc has the ability to split the contents of one column into two or more columns if there is a consistent character that marks the location where the splitting should happen. You can try using it through the menu Data -> Text To Columns. We might be able to make a specific suggestion if you can post a few characteristic rows of the data. Though the data are confidential, can you modify a few rows so that the structure is unchanged but the information gone? Try replacing every letter with x and every number with 1.
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
User avatar
Lupp
Volunteer
Posts: 3535
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Getting Pdf tables in excel

Post by Lupp »

I don't know what tool you used. I would not use online tools for confidential content anyway.

In a few cases I had to read tables from pdf. I did it with OCR (ReadIris 14 Pro in my case) and it worked (sufficiently) well. However, I had to do a bit of supervision in the preview and after impot again. A online tool may fail to allow for the first intermediary step.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Tech_Sau
Banned
Posts: 6
Joined: Wed Jun 06, 2018 12:50 pm

Re: Getting Pdf tables in excel

Post by Tech_Sau »

FJCC wrote:Calc has the ability to split the contents of one column into two or more columns if there is a consistent character that marks the location where the splitting should happen. You can try using it through the menu Data -> Text To Columns. We might be able to make a specific suggestion if you can post a few characteristic rows of the data. Though the data are confidential, can you modify a few rows so that the structure is unchanged but the information gone? Try replacing every letter with x and every number with 1.
Thanks for your suggestion, I will use calc and do this so simply. I think it is more simpler than the spreadsheet I am using.
[url=http://www.sntechsolutions.in]Nancy- Solution Provider[/url]
User avatar
Tech_Sau
Banned
Posts: 6
Joined: Wed Jun 06, 2018 12:50 pm

Re: Getting Pdf tables in excel

Post by Tech_Sau »

Lupp wrote:I don't know what tool you used. I would not use online tools for confidential content anyway.

In a few cases I had to read tables from pdf. I did it with OCR (ReadIris 14 Pro in my case) and it worked (sufficiently) well. However, I had to do a bit of supervision in the preview and after impot again. A online tool may fail to allow for the first intermediary step.

I never thought of using OCR to be used, I was using some other tool for the same but that was expired and it was urgent to complete the job so I went for online tool.
Anyways thanks for the recommendation, I do check it.
[url=http://www.sntechsolutions.in]Nancy- Solution Provider[/url]
Post Reply