Page 1 of 1

CSV Quoted String Problem

Posted: Tue Feb 17, 2009 1:42 pm
by genesoo
I am building address lists as CSV files and opening with SCALC.
For Connecticut zip codes, 06042, 06460, and 06109, I am losing the leading zero upon Text import.
If I change the Column Type from Standard to Text, the leading zero is preserved.

Here is the problem, I am trying to avoid the manual overides at import by rebuilding the zip code field as a quoted string.
For example: Name,Address,Manchester,CT,'06042',phonenumber
Launching the CSV file with SCALC, the Column type is identified as Standard, Text Delimiter is defaulting to a Double Quote and I see both leading and trailing single quotes in the Text Import Preview window however when I get the sheet up the leading single quote is missing and the trailing single quote remains.
If I put my mouse cursor on this cell, the cell contents shows the same.
MS Excel does not do this. Excel treats the cell as text and does not performing any blank or leading zero stripping.
It appears that scalc is dropping the leading single quote and leaving the rest of the string alone. In this case it should not do anything with the field since Double Quote was the delimiter(I should have seen the zip code bracketed in single quotes).

If I repeat the file launch and change the Text Delimiter to be Single Quote consistent with the example above, the Zip code Column is identified as Standard and the 06042 appears without quotes in Preview mode. When you get to the sheet, there are no quotes and the leading zero is stripped.

These tests were run on the current version of SCALC, 3.0.1 OOO300m15(build:9379).

I tried the same tests using double quotes in the CSV contents and obtained similar results.

If anyone can confirm this or tell me if I am doing something wrong, I would appreciate it.

Re: CSV Quoted String Problem

Posted: Tue Feb 17, 2009 1:58 pm
by Villeroy
Store all your equally structured csv files in a dedicated directory.
File>New>Database...
[X]Connect to existing
Type: Text
Specify directory and import options
[X] Register database
[Tutorial] Using registered data sources in Calc

The exporting software should take care that all text, including numeric strings like zip codes, are quoted, so any importing software can handle the "numbers" as text automatically.

A leading quote in a spreadsheet tags a numeric text as text, so no automatic conversion takes place. The quote is not part of the cell's value. =LEFT(A1;1) never returns the leading quote.
Do the test:
Type 123 into a cell and '123 into another cell.

Re: CSV Quoted String Problem

Posted: Tue Feb 17, 2009 5:32 pm
by genesoo
Villeroy, I do not completely understand your response.

Is the purpose of the "Registered Data Sources In CALC" steps to define hard coded Import options like Type=TEXT to column(s) and associate that with a fixed directory? That is what I infer from the update and the tutorial but I am not sure.

Question regarding "A Leading Quote..." and "Do The Test ...": This assumes that you are entering the single quote within a Spreadsheet cell. How about when being entered in CSV format? If the single quote marks the beginning of a string then where does it end?

Thanks

Re: CSV Quoted String Problem

Posted: Tue Feb 17, 2009 7:11 pm
by Villeroy
Abandon all hope. Stick with Excel. I give up.

Re: CSV Quoted String Problem

Posted: Tue Feb 17, 2009 8:18 pm
by acknak
Ouch. Rough day, Villeroy?
Is the purpose of the "Registered Data Sources In CALC" steps to define hard coded Import options like Type=TEXT to column(s) and associate that with a fixed directory?
Yes, exactly. Calc's import of text data is not very sophisticated. If you don't create a data source, you'll have to set the necessary import options every time you import the data.

An OOo data source using text files as the storage will treat each text file in the directory as a database table, and you can assign types for each column, independent of what the data looks like. So, postal codes can be imported as text instead of numbers.
How about when being entered in CSV format? If the single quote marks the beginning of a string then where does it end?
The quotes in the CSV file are a completely separate matter. They simply delimit the data field and protect any delimiters that appear within the quotes. The presence of quotes in a CSV file has no effect on how Calc interprets the data within the quotes. I.e. Calc will import a quoted string of digits as a number, not as text.

The leading single quote is only effective when entering data from the keyboard; it has nothing to do with text data import.

Re: CSV Quoted String Problem

Posted: Wed Feb 18, 2009 7:39 am
by Villeroy
acknak wrote:Calc's import of text data is not very sophisticated.
But Excel's is. And Excel is the application genesoo refers to, not Calc.
http://www.creativyst.com/Doc/Articles/ ... SVAndExcel
 Edit: Ouch! Meanwhile they implemented the same bug in Calc. Quoted string "0123" is a number. How disappointing, but at least you can declare it as text. The database component treats it right. When you drag your csv from a database into Calc everything is fine. Quoted string "0123" is just that in the entire column. 

Re: CSV Quoted String Problem

Posted: Wed Feb 18, 2009 1:49 pm
by genesoo
acknak and Villeroy,

Thanks for the answers.
I don't like Excel and am trying to find a way to use Calc instead of going back MS.
I am new to reporting Issues and at this stage it seems appropriate to report this as a support issue so the whole problem can be addressed. I will study setting up the Import Template as a circumvention and also report the issue.

Thanks again for the help.

I am leaving this post open for a few days in case others have comments.

Re: CSV Quoted String Problem

Posted: Wed Feb 18, 2009 2:51 pm
by Villeroy
As far as I know, Calc is able to handle correct csv correctly. At least you can specify the column types. Whatever you see or like to see quoted, double-quoted, unquoted, tagged with leading quote is unclear to me.
Putting your files into a dedicated directory (Linux links do actually work, btw) and adding a simple database file should resolve all your problems.

[Tutorial] Using registered data sources in Calc
Things that work well or well enough in Base but not in Calc:
- You get row sets rather than hashes of 2D-values.
- You can copy multiple selections of row sets.
- No row limit of 65536 (with bad perfomance since csv is not a true database format)
- You can apply arbitrary sort orders (>3 fields) on row sets without threatening your data (ever destroyed a spreadsheet by sorting the wrong range?)
- You can apply better filters more easily.
- Dragged into a spreadsheet you get refreshable, auto-resizing import ranges with auto-adjusting formula ranges.
- Set up once for all equal sturctures, use anywhere in Calc and Writer.
- You can create simple reports or very professional ones using the Sun Report Builder
- Text is text and number is number. A quick test with 3 lines of csv reveals that a whole column is text if one of it's values is quoted.

Re: CSV Quoted String Problem

Posted: Wed Feb 18, 2009 4:48 pm
by acknak
Nice summary, Villeroy!

genesoo: By all means, report what you feel are issues or needed improvements: [Tutorial] Reporting bugs or suggestions, but first you should review the reports already filed that are currently open. Here's a search link that should get most of the relevant issues: Open "csv" issues for Calc

BTW, Villeroy, I ran into something the other day: I wanted to drag a column from a database into a Calc sheet. I could not find any way to do that--is it possible, or do I have to drag the entire table in, then delete everything but the one column? I guess I could make a new query for just the one column as well.

Any thoughts?

Re: CSV Quoted String Problem

Posted: Wed Feb 18, 2009 5:22 pm
by Villeroy
Dragging a datasource column from the right pane into Writer inserts fields like the ones we use for serial letters. Surprisingly this does nothing in Calc. Work-around: Open the database file (there is a context-menu in the left pane) and add a query SELECT "Field_X" FROM "Table_Y". Then drag the whole query object from the left pane.

@genesoo
I forgot to mention that such a linked database.odb contains no data. It starts as a configuration file to link the office suite with some external data, for instance a directory of equally structured csv files. You may add queries (filtered and sorted record sets), reports (data dumped into Writer's layout) or forms (not useful with read-only sources).
There is also one special type of database which includes a whole portable database in the odb-file, but that's another story.
 Edit: Some hours later. 
Once you leave the spreadsheet-paradigma and start using the whole software, it is possible to create simple repositories of csv data where one database connects to comma-separated files with .csv-suffix, another database connects to tab-separated files with .tab-suffix and a third one exposes all *.xyz-files with semicolons. I would perfer separated directories, but you can separate them by name-suffix as well.
Issue 1(quirk): Dragged columns don't tranfer into spreadsheets.
Issue 2 (missing feature): The driver does not support text tables separated by fixed field lenghts.
Issue 3(quirk): Character encoding can be fine-tuned after database creation (menu:Edit>Database>Properties...)

If you know how to utilize OOo-forms, you can create forms in ordinary external documents, not embedded in a particular database. Forms and subforms can be used to show related row sets from different databases. This should work well enough with small text-databases (<10,000 rows?).
I used to do this in OOo1.x with a tiny collection of dBase and spreadsheets. Seemingly the same technique works in recent versions, but I did not spend much time testing it.
Issue 4: In a read-only database it seems to be impossible to get the form's record set out of the form without writing macros.
Push the last button on toolbar "Form Navigation" while the focus is in a form or subform. A "secondary beamer" shows the displayed fields of the current form. Drag the row set into a document.