Extracting information

Writing a book, Automating Document Production - Discuss your special needs here
Post Reply
yuckysocks
Posts: 3
Joined: Mon Jan 28, 2008 8:57 pm

Extracting information

Post by yuckysocks »

Hello!

I'm looking for information to perform the following. I'm new to the advanced features of wordprocessors, so forgive me if there is a common way to do this that I couldn't guess the "well known" name for. I DON'T think mailmerge is what I'm trying to do.

I have a 250 page output from ACT, that's a .doc
Each page is a contact record, but there are no attribute fields or anything. I want to extract all of the e-mail addresses based on the inclusion of the "@" character, and put each of those entries into a cell in a spreadsheet.

Additionally, if it's possible, I would like to export the contact names from this writer file also, and align those in the spreadsheet with their emails. Because of an old version of ACT< unfortunately this is the method I'm consigned to use.

Thanks, and hopefully we can make this happen :)

Alex

PS - Email is my best contact method.
yuckysocks
Posts: 3
Joined: Mon Jan 28, 2008 8:57 pm

Re: Extracting information

Post by yuckysocks »

Also,

I'm using OS X, open office 2.2.1

The text document is actually a rtf. Apologies for not being clear at first.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Extracting information

Post by acknak »

I think you're going to have to provide a sample of the document you have: the exact layout will determine how to extract the data. If you can provide a part of your actual input file (with any private information anonymized), it would be best.

In the end, Writer is probably not the right tool to do this. E.g. you can use a pattern search to find all the email addresses, but I can't see any way to do anything useful with the results in Writer. Some kind of macro would be needed to process the selected addresses. Writer is a word-processor and text formatter, not a text processor.

There are generic text processing utilities that are part of Unix (and therefore should be included as part of OSX) that are designed to do this kind of job. The first step may well be to use OOo to convert the RTF file to a plain text file.
AOO4/LO5 • Linux • Fedora 23
yuckysocks
Posts: 3
Joined: Mon Jan 28, 2008 8:57 pm

Re: Extracting information

Post by yuckysocks »

I'm also working concurrently with TextWrangler (if you have any familiarity with that). Here is one entry, taken after I saved as a .txt file:

Contact
Company Address 1
Contact Congressman Earl Address 2
Title Congressman, District 3 Address 3
Phone Ext. City
Fax State Zip
Salutation Congressman Earl Country
User Fields
Email Address Comments
Home/Phone
Alt Phone Ext. Home Address 1
Mobile Phone Home Address 2
Pager Home City
Home Phone Home State Home Zip
E-mail Address Katie.Drennan@blank Home Country


So in this case I would want "Congressman Earl...." and "Katie.Drennan@...." to be taken out.

Thanks for even pointing me in the right direction, if not solving my problem :) Sometimes the hardest part is just knowing where to start.

Alex
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Extracting information

Post by acknak »

Hmm, well I worked on it for a bit and got nowhere, so I guess I'm stumped. Your data is such an irregular mix of field names and data, with no specific structure that I can divine, that it would be difficult to suggest anything straightforward.

Maybe the other software will get the job done for you. Or, maybe someone else has a better approach.
AOO4/LO5 • Linux • Fedora 23
User avatar
foxcole
Volunteer
Posts: 1507
Joined: Mon Oct 08, 2007 1:31 am
Location: Minneapolis, Minnesota

Re: Extracting information

Post by foxcole »

Is this something you'll need to repeat often? If not, you'll probably save time by just copying and pasting. It's only 250 pages. You could also get someone else to do it, not for free but it would save you the hassle.
Cheers!
---Fox

OOo 3.2.0 Portable, Windows 7 Home Premium 64-bit
Post Reply