Page 1 of 1

[Solved] Text CSV to Calc import columns and rows

Posted: Fri Jul 16, 2010 7:46 am
by bizzyboi
Yes I read FAQ and guides. It helped me get as far as now able to import a txt file created in OOO into CALC
BUT
the text separates itself across columns and not like Excel where one sentence or delimted section oif text gets placed in one row ro form one case, and
so one has one text string or in my situation I require one sentence per row in the first column field
so that i can code each senetnec for research analysis using one coumn for each variable I intend to code for.
Please help me get it right as I have an enormous corpus of text.

I trialed one file by using OOO to insert TABS after ecah full stop but that seemed not to work so when i asked the CALC import box to use
a period of full stop it seemed ahppy but arranged ecah sentence in pone column right across the page forever...HELP
Thanks
Researcher
Poor academic who cannot afford to buy Excel
and lives where piracy is the norm :( :ucrazy: and difficult to find legal copies even in institutions

Re: Txt CSV to CALC import columns and rows

Posted: Fri Jul 16, 2010 8:37 am
by Hagar Delest
Hi and welcome to the forum!

Can you upload a sample file of what you have and what is the result you want (see: How to attach a document here)?

Re: Txt CSV to CALC import columns and rows

Posted: Fri Jul 16, 2010 3:37 pm
by acknak
Calc does not separate text files at sentence boundaries. In fact, I'm surprised that Excel does it.

Calc needs each input row to have a specific delimiter; each row should be a separate line in the input file.

You can work around this by massaging the text in an editor, or in OOo Writer, before you import it into Calc. You can search and replace each sentence ending by a newline, or (in Writer) a paragraph break.

Here's an example using Writer:

1) File > New > Text Document
2) Press the keys: d, t, F3 (this inserts a paragraph of "dummy text")
3) Edit > Find & Replace
Search for: ([.?!]) +
Replace with: $1\n
Options/Regular expressions: YES
Click "Replace All"

You should now have each sentence in its own paragraph.

4) Edit > Select All
5) Edit > Copy
6) File > New > Spreadsheet
7) Edit > Paste

You should now have a spreadsheet with a sentence in each row of column A.

Note: The example assumes English punctuation, and still is not perfect at matching sentence ends. It is possible to do better, but not possible to do it perfectly.

It also assumes that the starting text has newlines only at the end of a paragraph. If your text is wrapped at a specific column by newlines, then all bets are off.

If that doesn't help, then please attach a sample input text file, as Hagar suggested, so we can see exactly what you're working with.

Re: Txt CSV to CALC import columns and rows

Posted: Fri Jul 16, 2010 9:30 pm
by bizzyboi
:P :super: Thanks Acknak
Oh wow. Many thanks it works perfectly.
Just one teensy question more.

Once imported into Calc spreadsheet, I select column and format cells to wrap to accommodate the longer sentences. Yes, it works. BUT in one or two cases a tiny little brown arrow appears on the extreme right edge or the row next to the column boundary and that sentence is not complete . Hmmm. But some of them are shorter than longer sentences which wrap successfully. Ok if I drag the column to make it wider the marker disappears and the whole sentence is shown. But I don't understand why as longer sentences appear but some shorter seem to be truncated with this brown arrow. Hmm strange? Any idea why?

Just to explain what I am up to:
But thanks it is just what I want I have something like 1,000 pages types pages of student writing to create a spreadsheet database for coding and analysis for certain errors of language. Basically I suspect that the apparently smaller errors are in fact a result to what linguists call topic comment structuring in the students' native language but I want to do a correlation between certain error types to confirm my hunch quantitatively. There may be other tyros of patterns of error I can detect once I have code the data. This is why I have to preserve the sentences as the errors are based on sentence structure. I also use concordance software for sightly different analyses of the same data sets. But Spreadsheet can be pretty powerful tools for analysis for which yes specialised software packages exist at great cost but actually are not necessary.. I find a spreadsheet, a concordancer and SPSS all I need.

So again many thanks for your kind reply.

Re: [Solved] Text CSV to Calc import columns and rows

Posted: Fri Jul 16, 2010 11:04 pm
by acknak
The colored arrow/triangle shows that the cell content has been truncated--that the cell contains some content that is not displayed. Manually enlarging the cell, enough so that all the text can be shown, will make the arrow disappear--as you noticed.

I don't know what causes it, but I see many times that Calc seems unable to perfectly adjust the cell size without a little help.

Re: Txt CSV to CALC import columns and rows

Posted: Sat Jul 17, 2010 8:06 am
by vasa1
acknak wrote:Calc does not separate text files at sentence boundaries. In fact, I'm surprised that Excel does it.

...
I just tried with some text that had periods, commas, semi-colons, and carriage returns. I used Microsoft Office Excel 2007 to open the file after renaming it test.csv.

Excel, without prompting or asking what type of separator was required, only
looked at the commas to separate text into new cells on the same row
and
looked at carriage returns to start a new row.

Further, it did not automatically set column widths to accommodate varying lengths of text.

I did not fiddle with any settings, so I presume the defaults were active.

Calc 3.2.0 asked me to choose the separator and automatically adjusted column width (but since I use Calc for all my work, I've fiddled with quite a few settings).

Here's the text (just a standard paste):

During the drive to the McKillops’ Reginald was possessed with a great peace,
which was not wholly to be accounted for by the fact that he had inveigled his feet
into shoes a size too small for them. I misgave more than ever, and having once launched
Reginald on to the McKillops’ lawn, I established him near a seductive dish of
marrons glacés, and as far from the Archdeacon’s wife as possible;
as I drifted away to a diplomatic distance I heard with painful distinctness
the eldest Mawkby girl asking him if he had seen San Toy.

It must have been ten minutes later, not more, and I had been having quite an enjoyable chat with
my hostess, and had promised to lend her The Eternal City and my recipe for rabbit mayonnaise, and was just
about to offer a kind home for her third Persian kitten, when I perceived, out of the corner of my eye, that Reginald was not where I had left him,
and that the marrons glacés were untasted. At the same moment I became aware that old Colonel Mendoza was essaying to tell his classic story of how he introduced golf into India, and that Reginald was in dangerous proximity. There are occasions when Reginald is caviare to the Colonel.

“When I was at Poona in ’76”—

“My dear Colonel,” purred Reginald, “fancy admitting such a thing! Such a give-away for one’s age! I wouldn’t admit being on this planet in ’76.” (Reginald in his wildest lapses into veracity never admits to being more than twenty-two.)

Re: [Solved] Text CSV to Calc import columns and rows

Posted: Sat Jul 17, 2010 10:45 am
by bizzyboi
I have just managed to download and install 3.2.1
first time round it didn't work and the install aborted and attempted to reverse itself after which open office didn’t work at all and I had to do a system restore, then unistall 3.2 and then download and install 3.2.1 again

Now I tested all the stuff again.
1) Acknak's advice all works thanks. I assume $1/n = a hard return ??????????? I wonder where to find these codes out. I remember writing macros for WordPerfect 20 years ago when it knew you meant hard return and put the appropriate code in when you pressed enter unlike Word and now Open Office in search and replace.
2) The little brown arrows I have experimented with why certain rows are truncated and others not. It seems to have to do with if the row is one space too small to display a word it truncates, if it is more spaces to small it will rap the word to the next line. Or something like that. So some shorter lines are truncated and some longer lines are wrapped etc.
3) If I laboriously put a hard return in with enter after each full stop then pasting the text into the spreadsheet seems to work like excel one sentence per row. Yup but the search and replace function is easier.

Thanks all :super:

Re: [Solved] Text CSV to Calc import columns and rows

Posted: Sat Jul 17, 2010 3:17 pm
by acknak
I assume $1/n = a hard return ??????????? I wonder where to find these codes out.
Yes, the "\n" here represents an inserted paragraph break.

For more, see:
Regular Expressions in Writer [OOo wiki]

Regular expression samples [OOoNinja]

There's also a list of "codes" in the online help, but it's just a bare list: click "Help" from the Find & Replace dialog window.