How to import data from PDF to Calc?

Discuss the spreadsheet application
Post Reply
tim_p
Posts: 3
Joined: Sun Apr 15, 2012 10:52 pm

How to import data from PDF to Calc?

Post by tim_p »

hi all please be gentle i'm new here
ive got a small project for work the aim is to track tyre sales to view trends as to which tyres are becoming most popular.
the problem is ive maneged to import from a pdf but the data is badly organised for example

Code: Select all

	 01/04/2012
2	215/55R16
1	245/45R17
1	215/55R16
2	205/55R16
 	 01/08/2011
-1  195/65R15
this is it. The date is when the tyres were sold how many and finally the size i thought i could convert it into more meaning full date organised like

Code: Select all

size            january febuary march etc.
215/55r16           2     4        0
215/55r16           0     1        2
but i cant seam to find a way to convert the data any guidence is much apprecated p.s. theres tousands of tyres as its 12 moths of sales.

Title Edited. A descriptive title for posts helps others who are searching for solutions and increases the chances of a reply (Hagar, Moderator).
open office 3.2.1 win vista
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: i'm stuck

Post by acknak »

Importing or copy/paste from pdf is generally not a good approach--for the reasons you've already run into.

You may be able to clean it up after the import, but it's usually best to back up a step and try to get the data that went into making the pdf. Often that data will be available in some computer-readable format: a database or delimited text file (csv, for example).

Can you get in touch with the people who made the pdf?

If not, can you attach the pdf here, or give us a link to it? You can use the "Upload Attachment" link (below the message entry area after you click "POST REPLY"). For tips on posting large or confidential documents, see: [Forum] How to attach a document here
AOO4/LO5 • Linux • Fedora 23
tim_p
Posts: 3
Joined: Sun Apr 15, 2012 10:52 pm

Re: i'm stuck

Post by tim_p »

hi thanks for the quick reply i cant post the pdf as its hundreds of pages long and that exceeds the size limit they wont supply the invoices as anything other than pdfs
if i copy into writer it is in the following attached format
many thanks tim
Attachments
test.odt
(10.2 KiB) Downloaded 555 times
open office 3.2.1 win vista
User avatar
kingfisher
Volunteer
Posts: 2127
Joined: Tue Nov 20, 2007 10:53 am

Re: i'm stuck

Post by kingfisher »

The pdf import extension may help.

While on the site you may wish to look through pdf related extensions.
Apache OpenOffice 4.1.12 on Linux
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: i'm stuck

Post by acknak »

The pdf import extension is for editing pdfs using OOo; I'm not sure how it would help here. Copy/paste from a pdf reader may be good enough.

From the data in your sample, I was able to move it into Calc fairly easily by first replacing some of the spaces by tab characters, to separate the fields correctly:

In Writer:
1) Edit > Find & Replace
  • Search for: ^([^ ]+) +([^ ]+) +(.*) +([0-9.]+) +([0-9.]+) +([0-9.]+)$
    Replace with: $1\t$2\t$3\t$4\t$5\t$6
    Options/Regular expressions: YES
    Click "Replace all"
2) Copy the modified data; Edit > Paste Special > As unformatted text in Calc
3) Choose "Tab" as the delimiter; OK

Here's what I get:
pdf_data_import.png
pdf_data_import.png (11.5 KiB) Viewed 25302 times
AOO4/LO5 • Linux • Fedora 23
tim_p
Posts: 3
Joined: Sun Apr 15, 2012 10:52 pm

Re: How to import data from PDF to Calc?

Post by tim_p »

Hi all thanks for the reply so far I think the biggist problems face is as they are invoices on one day we could of had 1 - 20 tyres delivered so I suppose the problem is getting the date to appear next to every tyre row
At the mo it looks like this
date
Tyre details
Tyre details
Date
Tyre details
And there is no way to know how many tyres will have been delivered on that day
Hope this make sense tim
open office 3.2.1 win vista
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: How to import data from PDF to Calc?

Post by acknak »

It's going to be a lot of work to get that date extracted automatically. Since it's just one value for the whole batch, maybe a simple copy/paste would handle that. Calc will recognize a date entry pasted as-is from your sample document ("01-apr-12").
AOO4/LO5 • Linux • Fedora 23
User avatar
kingfisher
Volunteer
Posts: 2127
Joined: Tue Nov 20, 2007 10:53 am

Re: How to import data from PDF to Calc?

Post by kingfisher »

I don't know whether it would help or not but I have used an application called something like "Nitro pdf reader" on a Windows system to extract text from a pdf document. The application is free.
Apache OpenOffice 4.1.12 on Linux
Post Reply