Page 1 of 1
[Solved] Convert PDF table to spreadsheet
Posted: Wed Nov 06, 2013 9:54 pm
by 466385@tiscali.co.uk
Hi all,
I'm trying to analyse a very large table in a report delivered in PDF format. My Adobe reader offers to convert the table to .xlsx, but then asks me for $19.99 for a year's subscription. This isn't a lot of money, but I'm unlikely ever to need the conversion again, so don't want to commit. Can anybody point to a reliable freeware option, please? I've had a good look at various tools, but they either don't work or won't work without prepayment.
I hope this doesn't sound too mean. I'd gladly make a contribution to the developer of an application that works.
Thanks,
Ian
Re: Convert PDF table to spreadsheet
Posted: Wed Nov 06, 2013 10:23 pm
by Villeroy
Copy&Paste?
Re: Convert PDF table to spreadsheet
Posted: Wed Nov 06, 2013 10:48 pm
by 466385@tiscali.co.uk
Copy and paste won't work - I've tried. There's something about the formatting of the table that prevents this. Even if I can highlight a full line (as example below), I get the result into one cell.
Ian
Re: Convert PDF table to spreadsheet
Posted: Wed Nov 06, 2013 10:50 pm
by Villeroy
Copy more than one row.
Re: Convert PDF table to spreadsheet
Posted: Wed Nov 06, 2013 11:11 pm
by acknak
Pdf is not a document format, it's raw character-at-some-position data. I've seen pdf files that couldn't even produce the proper text, let alone any higher structure like lines, paragraphs or tables.
In other words, OO Writer has no way to figure out what text goes in what cell/row/column.
The best you can do is paste the text into some editor--Writer should work if you don't have anything else--and modify the text to make a delimited text (data values separated by commas or tabs), and then paste that into Calc. Calc can split the values based on the delimiters and interpret the text to give you numbers, if that's what you need.
Re: Convert PDF table to spreadsheet
Posted: Thu Nov 07, 2013 12:26 am
by 466385@tiscali.co.uk
Villeroy - won't work. acknak - the table has 650+ lines, so not a practical proposition. What you seem to be saying is that the only way to go is to pay Adobe's subscription. Since the original PDF format is theirs, I have to assume they'll know what they're doing. I'll start saving up.
Thanks anyway,
Ian
Re: Convert PDF table to spreadsheet
Posted: Thu Nov 07, 2013 1:18 am
by RoryOF
You could Google for a free or shareware PDF editor which will extract your data.
Re: Convert PDF table to spreadsheet
Posted: Thu Nov 07, 2013 2:21 am
by acknak
pdf is not a data transfer format. If you want the data, you should get it from the source in a proper data format. If that isn't possible for some reason, then you have to either a) spend some money or b) spend some time doing the conversion.
650 lines is not much--if you want some help, send it to me. Unless there's something unusual, it wouldn't take more than, oh, say 30 minutes.
Re: Convert PDF table to spreadsheet
Posted: Thu Nov 07, 2013 10:23 am
by 466385@tiscali.co.uk
Rory - "You could Google for a free or shareware PDF editor which will extract your data." - this is why I came here - I can't find anything reliable.
acknak - "... get it from the source in a proper data format ..." - I've asked. This is from a database that the public authority concerned is reluctant to manipulate for me. It's embedded in a 208 page report and itself covers 40 pages. Thanks for the offer, but I won't bother you with this.
I'll keep pressing the agency. I might even try out FOI. Meanwhile, I'll close this off as "solved" as I'm not getting anywhere.
Thanks anyway,
Ian
Re: Convert PDF table to spreadsheet SOLVED
Posted: Thu Nov 07, 2013 10:34 am
by RoryOF
I was deliberately vague in my reply, because my operating system (Xubuntu) differs from yours (XP) and I have no recent experience with PDF extraction on XP.
However, for what it's worth, I have extracted content (text, not tables) from PDFs using Linux tools; Calibre (my tool of choice) comes in Linux and Windows flavours and should extract the content. Other research suggests that you may then need to massage your tabular data into suitable form to be exported/imported as CSV. If you decide to try extraction, do please work on copies of the master file, in case of any catastrophe.
Re: Convert PDF table to spreadsheet SOLVED
Posted: Thu Nov 07, 2013 1:08 pm
by 466385@tiscali.co.uk
Hi Rory,
I've extracted spreadsheets from many types of document but this one is very peculiar. I'll post back on here when I do eventually crack it. Meanwhile, thanks anyway,
Ian