[Solved] Help needed with corrupt CSV file

Discuss the spreadsheet application
Locked
Blaz Gindiciosi
Posts: 3
Joined: Mon Nov 04, 2024 5:34 pm

[Solved] Help needed with corrupt CSV file

Post by Blaz Gindiciosi »

Hi,
I'm really hoping someone here can help me and save me days of work. :crazy:
I've been working on a large .csv spreadsheet including Japanese characters and when I opened it this morning it appeared to be corrupt with the majority of the content simply missing, merged columns, ? appearing where text used to be etc.
The whole spreadsheet is 1.67MB so I can't upload it, but here's a partial copy of the data. Is there any way I can upload the whole thing?

Can anyone here think of a way to retrieve the missing text? To me, it seems it's all just gone and that's the end of it, but I thought I'd post here if anyone can help.

Let me know if I can send any further information and many thanks in advance,
Blaz
Attachments
Equine localization corrupt file.csv
(7.31 KiB) Downloaded 64 times
Last edited by MrProgrammer on Mon Nov 11, 2024 5:03 pm, edited 2 times in total.
Reason: Tagged ✓ [Solved] File has been destroyed by incorrect encoding for import/export -- MrProgrammer, forum moderator
LibreOffice Version 7.4.3.2 (x64) on Windows 11 Home
User avatar
Hagar Delest
Moderator
Posts: 33630
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Help needed with corrupt CSV file

Post by Hagar Delest »

Hi and welcome to the forum!

To upload bigger files, you need to use a 3rd party web site for sharing files.
Like mediafire if that still exists.

Note: why work with csv with Calc? Don't you have a .ods and then export in csv when needed only?
LibreOffice 25.2 on Linux Mint Debian Edition (LMDE 7 Gigi) and 25.2 portable on Windows 11.
Blaz Gindiciosi
Posts: 3
Joined: Mon Nov 04, 2024 5:34 pm

Re: Help needed with corrupt CSV file

Post by Blaz Gindiciosi »

Hi Hagar,
Thanks for such a quick reply.

Here is the file - hope it works: https://www.dropbox.com/scl/fi/36jckpya ... fonxa&dl=0

Our projects requires the file to be in .csv so we are all working in this format. I've asked a colleague at work and it seems to be an issue with the .ods to .csv conversion.
His response:
'Apparently the data was saved as "fixed-width", and all lines longer than ~40 characters were truncated and only the first 40 characters were saved in such lines. There is no way to restore truncated data or those ???? to original data since the file doesn't store history or previous changes. We can try and convert this document back to CSV format at least but I'm not sure it will be helpful.'

This conversion happened a few days ago and I've saved and opened it a few times since then with no issues. Suddenly today it appears corrupted and looks like a week of work is lost. Lesson learned!

I'd really appreciate a second opinion on this.
LibreOffice Version 7.4.3.2 (x64) on Windows 11 Home
Jan_J
Posts: 195
Joined: Wed Apr 29, 2009 1:42 pm
Location: Poland

Re: Help needed with corrupt CSV file

Post by Jan_J »

Your file seems to be encoded as UTF-7. This is rare format today, although it is ultimately correct.
Two control questions:
1. Are you sure that all material inside your huge .csv is encoded using UTF-7? if not, you need to reorganize it in order to get one common encoding.
2. During the import, does Calc „know” that input file is UTF-7 encoded? Maybe you need to specify input encoding in import dialog box.

I did not work with Japanese, but used to deal in common rather with UTF-8 encoding. Great advantage of UTF-7 is using only ASCII (i.e. less that 128) code points, which make all the content source legible at character-by-character level. UTF-8 gets smaller file sizes, al least for European languages. Open- and LibreOffice are able to proceed both encodings, as well as many others.
JJ ∙ https://forum.openoffice.org/pl/
LO (26.2) ∙ Python (3.13|3.10) ∙ Unicode 17 ∙ LᴬTEX 2ε ∙ XML ∙ Unix tools ∙ Linux (Rocky|CentOS)
Jan_J
Posts: 195
Joined: Wed Apr 29, 2009 1:42 pm
Location: Poland

Re: Help needed with corrupt CSV file

Post by Jan_J »

The full version is no longer UTF-7. But it is (probably) irreversibly corrupted due to attempt to encode using some alphabet non containing characters used. The approximative translation has been applied, that caused occurences of „?” (literal question mark, ASCII/UNICODE 63) in places where characters could not be transcoded strictly. This is common behaviour of may transcoding softwares.
You can see this effect e.g. in row 26, char 132 and successive:

Code: Select all

In<b>????</b>
I am afraid that disaster can not be reverted. There is no info in this file about the original content.
JJ ∙ https://forum.openoffice.org/pl/
LO (26.2) ∙ Python (3.13|3.10) ∙ Unicode 17 ∙ LᴬTEX 2ε ∙ XML ∙ Unix tools ∙ Linux (Rocky|CentOS)
Blaz Gindiciosi
Posts: 3
Joined: Mon Nov 04, 2024 5:34 pm

Re: Help needed with corrupt CSV file

Post by Blaz Gindiciosi »

Hi Jan,

Thank you very much for looking.

I was afraid that might be the case, but wanted to check here. Any ideas on how to prevent these disasters in the future when working with CSV files?

OK, here we go from the top again.
LibreOffice Version 7.4.3.2 (x64) on Windows 11 Home
User avatar
MrProgrammer
Moderator
Posts: 5430
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Help needed with corrupt CSV file

Post by MrProgrammer »

Blaz Gindiciosi wrote: Mon Nov 04, 2024 5:52 pm I've been working on a large .csv spreadsheet including Japanese characters and when I opened it this morning it appeared to be corrupt
CSV is a text format. It is not a spreadsheet. There is no universal standard for CSV format, so one has to be aware that import/export problems can occur if applications uses different CSV conventions. Spreadsheets can import from CSV and export to CSV, but this must be done carefully. There are many opportunities for mistakes.

Blaz Gindiciosi wrote: Mon Nov 04, 2024 7:04 pm I've asked a colleague at work and it seems to be an issue with the .ods to .csv conversion.
So the data was at one point in an ODS spreadsheet. How was this spreadsheet created? Did you type in 1.67 megabytes of data? If not, then it must have originated from a different source. We do not know how this spreadsheet was created. Perhaps the process which created it was in error. Or perhaps the data was imported into it from a web page or text file.

Are you positive that the import was done correctly? If not, the ODS spreadsheet is garbage and you cannot proceed any farther. Hundreds of topics on the forum are created for difficulties caused by incorrect import of data into a spreadsheet from another source. If you are importing data I recommed that you read [Tutorial] Text to Columns. The tutorial does not deal much with character encoding but that is another source of trouble. When importing you must know what encoding was used for the input to Calc and then specify that in the Text Import dialog. Otherwise the imported data in the spreadsheet is garbage, and all further work is doomed to failure.

Next, if you want to create a CSV text file from your correctly created spreadsheet, that's fine, but first save the ODS spreadsheet. Then you have a recovery point if the CSV creation is done incorrectly. Always keep the ODS spreadsheet until you have finished working on the project. As with input, you must know what character encoding is needed by the external application which will be reading the CSV file. If you use the wrong encoding the other application will find that the CSV file contains garbage.

CSV text files are useful for experienced computer professionals. If that your project requires use of that format, and you are not one, then there's risk that the project will fail until you find one to help you.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
Locked