[Dropped] Linebreaks not working on CSV import?

Talk about anything at all....
Post Reply
fiona87
Posts: 8
Joined: Mon Oct 19, 2020 1:32 pm

[Dropped] Linebreaks not working on CSV import?

Post by fiona87 »

Hi there,

I am new here (and with openoffice and all this stuff to be honest) so forgive me if this is an obvious task.

I have some csv files and when opening in OpenOffice, it generally all seems good except a few times there are cases where it seems linebreaks are not detected. The result is an endless row that contains what should actually be many different rows.

Screenshot attached where you can see that data for values 177 to 193 have been all introduced in the same row :S

TIA!
Attachments
openoff.PNG
Last edited by MrProgrammer on Sun Nov 01, 2020 5:50 pm, edited 2 times in total.
Reason: Moved topic from Calc forum to General discussion since the problem is with the website scraper
OpenOffice 4.1.7 on Windows 10
User avatar
RusselB
Moderator
Posts: 6646
Joined: Fri Jan 03, 2014 7:31 am
Location: Sarnia, ON

Re: Linebreaks not working?

Post by RusselB »

Welcome to the Forums.
Without being able to examine the actual data, which I suspect we won't be able to do due to confidentiality reasons, I can't determine if the line breaks are actually there, or if there was a misgenerated line structure, which has been known to happen with some methods of CSV file generation.

If it's just the one line or occasional line(s) and you don't have to import from the same source often, then I'd just handle the problem manually, using cut & paste.
On the other hand, if this has happened a lot and/or you have to import from that source fairly often, then get in touch with the IT department of the company where the file was generated and mention the problem to them, so that they can investigate.
OpenOffice 4.1.7, LibreOffice 7.0.1.2 on Windows 7 Pro, Ultimate & Windows 10 Home (2004)
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
fiona87
Posts: 8
Joined: Mon Oct 19, 2020 1:32 pm

Re: Linebreaks not working?

Post by fiona87 »

Hi again,

This is all data scraped from google, so I have no issue sharing it :) I did leave only some of the rows around the issue so it's easier to check. Thanks!
Attachments
row issue.csv
(38.99 KiB) Downloaded 230 times
OpenOffice 4.1.7 on Windows 10
User avatar
RusselB
Moderator
Posts: 6646
Joined: Fri Jan 03, 2014 7:31 am
Location: Sarnia, ON

Re: Linebreaks not working?

Post by RusselB »

I don't know how that file was generated, but the problem appears to lie in that the 177 isn't in quotes, but the 178 (which appears to be appended to the 177 line) is in quotes.
I think the quotes are what are causing the difficulty.
OpenOffice 4.1.7, LibreOffice 7.0.1.2 on Windows 7 Pro, Ultimate & Windows 10 Home (2004)
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
FJCC
Moderator
Posts: 9231
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Linebreaks not working?

Post by FJCC »

The line that begins with 177 and many of the subsequent lines are terminated with line feeds instead of paragraph marks. The csv import filter will not see the line feeds as the end of a line. There may be a way to adjust this in the filter import options, I don't know.
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
fiona87
Posts: 8
Joined: Mon Oct 19, 2020 1:32 pm

Re: Linebreaks not working?

Post by fiona87 »

It's weird that the format changes because the entire file is created with same scraper, anyways I can fix it without doing it manually?

Thanks again!
OpenOffice 4.1.7 on Windows 10
Bill
Volunteer
Posts: 8929
Joined: Sat Nov 24, 2007 6:48 am

Re: Linebreaks not working?

Post by Bill »

The line feeds will work, but only if Calc sees them as line feeds and not as text. The line feeds are between quotation marks, so Calc does see them as text and ignores them. In the attached file, using a text editor, I deleted the quotation marks at the end of the lines and the extra quotation marks before and after the numbers at the beginning of each line.
Attachments
row issue177.csv
(1.71 KiB) Downloaded 223 times
AOO 4.1.14 on Ubuntu MATE 22.04
User avatar
MrProgrammer
Moderator
Posts: 4883
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Linebreaks not working?

Post by MrProgrammer »

Hi, and welcome to the forum.
fiona87 wrote:Screenshot attached …
Screen images are usually worthless for solving problems. Save yourself time and always attach a document demonstrating the difficulty (remove confidential information then use Post Reply, not Quick Reply, and don't attach a picture instead of the document itself). Thank you for the later attachment. It shows what is wrong.
fiona87 wrote:I have some csv files and when opening in OpenOffice, it generally all seems good except a few times there are cases where it seems linebreaks are not detected. The result is an endless row that contains what should actually be many different rows.
When you open the CSV file with Writer you will see that it contains garbage. I loaded your file in Writer using File → New → Text Document; Insert → File → row issue.csv → Open.
Garbage.odt
(22.34 KiB) Downloaded 226 times
Here is the record beginning with 176, which is OK.
   "176","DIA","Av. Baleares, s/n, 04738 Vícar, Almería","912170453","4,1","74","dia.es",,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Here is the record beginning with 177, which is not formatted correctly for CSV.
   "177","Decathlon Huércal De Almería","Parque Comercial Cemar Carretera Nacional 340 a"," Local 0
   5"," 04230 Huércal de Almería"," Almería""","950144927","4,0","2.456","decathlon.es","
There is an extra, unmatched quote after "decathlon.es", and so the import for 177 continues with the following record
   ""178""","Carrefour","Autopista A-7, Salida Km. 429 Parque Comercial Viapark, 04738 Vícar, Almer
   ía","604266409","3,8","2.384","carrefour.es","
and then, since another unmatched quote is at the end of the line after "carrefour.es", with
   ""179""","Supermercados adela","Calle Falcón, 04740 Roquetas de Mar, Almería","651924344","5,0",
   "5","Añadir sitio web","
and further mutilated lines.

You can see that the records have extra quotes in the CSV around the record numbers (178, 179, …). There are similar quoting problems in other records. What a mess!! Also in 177, the address field seems to be damaged.
"Parque Comercial Cemar Carretera Nacional 340 a"," Local 05"," 04230 Huércal de Almería"," Almería"""

For over 50 years, a popular computer programmer proverb is "Garbage in, garbage out." That is, if your data is garbage, when you feed it to a program for processing you will just get garbage as the result. That is what has happened here. Your CSV is malformed, thus you get garbage when you process it with Calc's text import. You must fix the CSV before you can import it. We don't know how your CSV became damaged, and that isn't an OpenOffice problem.
fiona87 wrote:anyways I can fix it without doing it manually?
Based on the small sample you provided, I can fix the sample using Writer with four Edit → Find & Replace operations:
• Search for a"," Local 05"," 04230 Huércal de Almería"," Almería"" → Replace with a Local 05 04230 Huércal de Almería Almería → Replace
 Edit: I am guessing at how to fix the garbage in the address; you will need to select what it should be 
• Search for [ ,]+$ → Replace with {leave field empty} → More → ✓ Regular expressions → Replace All → OK
 Edit: Remove trailing commas, preparing for step 4 
• Search for ^""(\d+)""" → Replace with "$1" → More → ✓ Regular expressions → OK
 Edit: Remove doubled quotes around record identifiers 178, 179 … 
• Search for ,"[^"]*$ → Replace with &" → More → ✓ Regular expressions → OK
 Edit: Add missing quote at end of line 
Uncheck Regular Expressions, then click Close. In [ ,] there is a space and a comma between the brackets.

Select everything, then File → New → Spreadsheet, Edit → Paste Special → Unformatted text → Uncheck Tab → Check Comma → OK. Using the CSV subset you provided, these four Writer edits imported the data in a reasonable manner. You may need to experiment with other edits, perhaps by hand, in Writer to fix the rest of your broken CSV. This may be a painful process. Your time may be better spent trying to get good CSV for import.
fiona87 wrote:the entire file is created with same scraper
Your "scraper" created bad CSV; that is not an OpenOffice problem.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the subject field. Select the green checkmark icon at the same time.

[Tutorial] Ten concepts that every Calc user should know
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
Post Reply