Hi there,
I am new here (and with openoffice and all this stuff to be honest) so forgive me if this is an obvious task.
I have some csv files and when opening in OpenOffice, it generally all seems good except a few times there are cases where it seems linebreaks are not detected. The result is an endless row that contains what should actually be many different rows.
Screenshot attached where you can see that data for values 177 to 193 have been all introduced in the same row :S
TIA!
[Dropped] Linebreaks not working on CSV import?
[Dropped] Linebreaks not working on CSV import?
Last edited by MrProgrammer on Sun Nov 01, 2020 5:50 pm, edited 2 times in total.
Reason: Moved topic from Calc forum to General discussion since the problem is with the website scraper
Reason: Moved topic from Calc forum to General discussion since the problem is with the website scraper
OpenOffice 4.1.7 on Windows 10
Re: Linebreaks not working?
Welcome to the Forums.
Without being able to examine the actual data, which I suspect we won't be able to do due to confidentiality reasons, I can't determine if the line breaks are actually there, or if there was a misgenerated line structure, which has been known to happen with some methods of CSV file generation.
If it's just the one line or occasional line(s) and you don't have to import from the same source often, then I'd just handle the problem manually, using cut & paste.
On the other hand, if this has happened a lot and/or you have to import from that source fairly often, then get in touch with the IT department of the company where the file was generated and mention the problem to them, so that they can investigate.
Without being able to examine the actual data, which I suspect we won't be able to do due to confidentiality reasons, I can't determine if the line breaks are actually there, or if there was a misgenerated line structure, which has been known to happen with some methods of CSV file generation.
If it's just the one line or occasional line(s) and you don't have to import from the same source often, then I'd just handle the problem manually, using cut & paste.
On the other hand, if this has happened a lot and/or you have to import from that source fairly often, then get in touch with the IT department of the company where the file was generated and mention the problem to them, so that they can investigate.
OpenOffice 4.1.7, LibreOffice 7.0.1.2 on Windows 7 Pro, Ultimate & Windows 10 Home (2004)
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
Re: Linebreaks not working?
Hi again,
This is all data scraped from google, so I have no issue sharing it I did leave only some of the rows around the issue so it's easier to check. Thanks!
This is all data scraped from google, so I have no issue sharing it I did leave only some of the rows around the issue so it's easier to check. Thanks!
- Attachments
-
- row issue.csv
- (38.99 KiB) Downloaded 230 times
OpenOffice 4.1.7 on Windows 10
Re: Linebreaks not working?
I don't know how that file was generated, but the problem appears to lie in that the 177 isn't in quotes, but the 178 (which appears to be appended to the 177 line) is in quotes.
I think the quotes are what are causing the difficulty.
I think the quotes are what are causing the difficulty.
OpenOffice 4.1.7, LibreOffice 7.0.1.2 on Windows 7 Pro, Ultimate & Windows 10 Home (2004)
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
Re: Linebreaks not working?
The line that begins with 177 and many of the subsequent lines are terminated with line feeds instead of paragraph marks. The csv import filter will not see the line feeds as the end of a line. There may be a way to adjust this in the filter import options, I don't know.
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
Re: Linebreaks not working?
It's weird that the format changes because the entire file is created with same scraper, anyways I can fix it without doing it manually?
Thanks again!
Thanks again!
OpenOffice 4.1.7 on Windows 10
Re: Linebreaks not working?
The line feeds will work, but only if Calc sees them as line feeds and not as text. The line feeds are between quotation marks, so Calc does see them as text and ignores them. In the attached file, using a text editor, I deleted the quotation marks at the end of the lines and the extra quotation marks before and after the numbers at the beginning of each line.
- Attachments
-
- row issue177.csv
- (1.71 KiB) Downloaded 223 times
AOO 4.1.14 on Ubuntu MATE 22.04
- MrProgrammer
- Moderator
- Posts: 4885
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: Linebreaks not working?
Hi, and welcome to the forum.
You can see that the records have extra quotes in the CSV around the record numbers (178, 179, …). There are similar quoting problems in other records. What a mess!! Also in 177, the address field seems to be damaged.
"Parque Comercial Cemar Carretera Nacional 340 a"," Local 05"," 04230 Huércal de Almería"," Almería"""
For over 50 years, a popular computer programmer proverb is "Garbage in, garbage out." That is, if your data is garbage, when you feed it to a program for processing you will just get garbage as the result. That is what has happened here. Your CSV is malformed, thus you get garbage when you process it with Calc's text import. You must fix the CSV before you can import it. We don't know how your CSV became damaged, and that isn't an OpenOffice problem.
• Search for a"," Local 05"," 04230 Huércal de Almería"," Almería"" → Replace with a Local 05 04230 Huércal de Almería Almería → Replace
• Search for [ ,]+$ → Replace with {leave field empty} → More → ✓ Regular expressions → Replace All → OK
• Search for ^""(\d+)""" → Replace with "$1" → More → ✓ Regular expressions → OK
• Search for ,"[^"]*$ → Replace with &" → More → ✓ Regular expressions → OK
Uncheck Regular Expressions, then click Close. In [ ,] there is a space and a comma between the brackets.
Select everything, then File → New → Spreadsheet, Edit → Paste Special → Unformatted text → Uncheck Tab → Check Comma → OK. Using the CSV subset you provided, these four Writer edits imported the data in a reasonable manner. You may need to experiment with other edits, perhaps by hand, in Writer to fix the rest of your broken CSV. This may be a painful process. Your time may be better spent trying to get good CSV for import.
If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the subject field. Select the green checkmark icon at the same time.
[Tutorial] Ten concepts that every Calc user should know
Screen images are usually worthless for solving problems. Save yourself time and always attach a document demonstrating the difficulty (remove confidential information then use Post Reply, not Quick Reply, and don't attach a picture instead of the document itself). Thank you for the later attachment. It shows what is wrong.fiona87 wrote:Screenshot attached …
When you open the CSV file with Writer you will see that it contains garbage. I loaded your file in Writer using File → New → Text Document; Insert → File → row issue.csv → Open. Here is the record beginning with 176, which is OK.fiona87 wrote:I have some csv files and when opening in OpenOffice, it generally all seems good except a few times there are cases where it seems linebreaks are not detected. The result is an endless row that contains what should actually be many different rows.
"176","DIA","Av. Baleares, s/n, 04738 Vícar, Almería","912170453","4,1","74","dia.es",,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Here is the record beginning with 177, which is not formatted correctly for CSV.
"177","Decathlon Huércal De Almería","Parque Comercial Cemar Carretera Nacional 340 a"," Local 0
5"," 04230 Huércal de Almería"," Almería""","950144927","4,0","2.456","decathlon.es","
There is an extra, unmatched quote after "decathlon.es", and so the import for 177 continues with the following record""178""","Carrefour","Autopista A-7, Salida Km. 429 Parque Comercial Viapark, 04738 Vícar, Almer ía","604266409","3,8","2.384","carrefour.es","and then, since another unmatched quote is at the end of the line after "carrefour.es", with
""179""","Supermercados adela","Calle Falcón, 04740 Roquetas de Mar, Almería","651924344","5,0", "5","Añadir sitio web","and further mutilated lines.
You can see that the records have extra quotes in the CSV around the record numbers (178, 179, …). There are similar quoting problems in other records. What a mess!! Also in 177, the address field seems to be damaged.
"Parque Comercial Cemar Carretera Nacional 340 a"," Local 05"," 04230 Huércal de Almería"," Almería"""
For over 50 years, a popular computer programmer proverb is "Garbage in, garbage out." That is, if your data is garbage, when you feed it to a program for processing you will just get garbage as the result. That is what has happened here. Your CSV is malformed, thus you get garbage when you process it with Calc's text import. You must fix the CSV before you can import it. We don't know how your CSV became damaged, and that isn't an OpenOffice problem.
Based on the small sample you provided, I can fix the sample using Writer with four Edit → Find & Replace operations:fiona87 wrote:anyways I can fix it without doing it manually?
• Search for a"," Local 05"," 04230 Huércal de Almería"," Almería"" → Replace with a Local 05 04230 Huércal de Almería Almería → Replace
Edit: I am guessing at how to fix the garbage in the address; you will need to select what it should be |
Edit: Remove trailing commas, preparing for step 4 |
Edit: Remove doubled quotes around record identifiers 178, 179 … |
Edit: Add missing quote at end of line |
Select everything, then File → New → Spreadsheet, Edit → Paste Special → Unformatted text → Uncheck Tab → Check Comma → OK. Using the CSV subset you provided, these four Writer edits imported the data in a reasonable manner. You may need to experiment with other edits, perhaps by hand, in Writer to fix the rest of your broken CSV. This may be a painful process. Your time may be better spent trying to get good CSV for import.
Your "scraper" created bad CSV; that is not an OpenOffice problem.fiona87 wrote:the entire file is created with same scraper
If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the subject field. Select the green checkmark icon at the same time.
[Tutorial] Ten concepts that every Calc user should know
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).