[Solved] Comparing Calc documents with different formats

Discuss the spreadsheet application
Post Reply
HalloCoffee
Posts: 4
Joined: Tue Mar 24, 2020 2:34 pm

[Solved] Comparing Calc documents with different formats

Post by HalloCoffee »

EDIT: Thanks everyone! I've managed to solve my issues. Could never have done it without your help, so thank you very very much. Stay safe!



Hello everyone. Thank you for reading my problem, and perhaps helping me find a solution

EDIT.

In attachment there are two sample files.

What I want to do is Identify the rows in Sample Two that have a matching row in Sample One.

To do this, I need to match column C in Sample Two (It contains dossier numbers) with the numbers in Sample one, namely those found in column K.

What is the problem? The dossier number in Sample Two are written like this 'B13 18849' The dossier number in Sample One are written in another format. Namely, '106587/B131884900-10001'.

Can I make a new column in Sample Two that compares both documents, and that looks for a (partial) match? (Note that the space between 'B131' and '8849' is only present in Sample two)

Thank you.
Attachments
Sample Two.ods
(10.97 KiB) Downloaded 139 times
Sample One.ods
(10.08 KiB) Downloaded 134 times
Last edited by robleyd on Wed Mar 25, 2020 11:35 am, edited 3 times in total.
Reason: Tag as Solved
OpenOffice 4.1.3 on Windows10
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Comparing two documents in Calc with different fomats

Post by John_Ha »

Please upload two small .ods files with the values and explain exactly what you are trying to do.

Use the Upload attachment tab below where you type (128 kB max); or use a file share site, Dropbox or Google Drive for a larger file.

I don't know what you mean by " nr ".
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
HalloCoffee
Posts: 4
Joined: Tue Mar 24, 2020 2:34 pm

Re: Comparing two documents in Calc with different fomats

Post by HalloCoffee »

Thank you , John. I edited the orignial question. I hope it is clearer now.
John_Ha wrote:Please upload two small .ods files with the values and explain exactly what you are trying to do.

Use the Upload attachment tab below where you type (128 kB max); or use a file share site, Dropbox or Google Drive for a larger file.

I don't know what you mean by " nr ".
OpenOffice 4.1.3 on Windows10
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Comparing two documents in Calc with different fomats

Post by John_Ha »

I can start you off.

1. It is easy (create a new column if required) to delete the space in 'B13 18849' using a Regular Expression search. Search for the first 3 characters, the space, and the last 5 characters, and write back without the space.

Code: Select all

Find  (...)([:space:])(.....)
Replace  $1$3
2. If the format of '106587/B131884900-10001' is always the same it is easy to extract the 'B1318849' as the eight characters following the / using a Regular expression. Search for any number of digits, for the /, for the next 8 characters, and for anything remaining. Write back the 8 characters.

Code: Select all

Find ([:digit:]*)(/)(........)(.*)
Replace $3 
You may need a more specific search to avoid finding other things.

You now have helper columns in each spreadsheet with the identically formatted eight characters "B1318849" in them.

See [Tutorial] How to record a macro (and Regular Expressions) for help with regular expressions.

Someone else should be able to help with the rest.

As a one time operation, copy everything into one spread sheet having added a column to identify which sheet the line is from.
Sort everything on the column with the "B1318849".
Use a helper column to show when the are duplicates - ie cell [col, n] and cell [col, n+1] are the same.

It will go wrong if your regular expression searches do not cope with all variations of the number formats.
Last edited by John_Ha on Tue Mar 24, 2020 5:47 pm, edited 1 time in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
HalloCoffee
Posts: 4
Joined: Tue Mar 24, 2020 2:34 pm

Re: Comparing two documents in Calc with different fomats

Post by HalloCoffee »

Thanks a lot John. This has helped me a lot. I hope I can figure this out quick, but I think you've helped a ton.
John_Ha wrote:I can start you off.

1. It is easy (create a new column if required) to delete the space in 'B13 18849' using a Regular Expression search. Search for the first 3 characters, the space, and the last 5 characters, and write back without the space.

Code: Select all

Find  (...)([:space:])(.....)
Replace  $1$3
2. If the format of '106587/B131884900-10001' is always the same it is easy to extract the 'B1318849' as the eight characters following the / using a Regular expression. Search for any number of digits, for the /, for the next 8 characters. Write back the 8 characters.

Code: Select all

Find ([:digit:]*)(/)(........)(.*)
Replace $3 
I have not tested the arguments above.

You now have helper columns in each spreadsheet with the identically formatted eight characters "B1318849" in them.

See [Tutorial] How to record a macro (and Regular Expressions) for help with regular expressions.

Someone else should be able to help with the rest.

As a one time operation, copy everything into one spread sheet having added a column to identify which sheet the line is from.
Sort everything on the column with the "B1318849".
Use a helper column to show when the are duplicates - ie cell [col, n] and cell [col, n+1] are the same.

It will go wrong if your regular expression searches do not cope with all variations of the number formats.
OpenOffice 4.1.3 on Windows10
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Comparing two documents in Calc with different fomats

Post by Villeroy »

I put 154557/B0711671-10062 in cell G4 of "Sample One.ods" and turned off "match whole cells" in the calculation options of "Sample Two.ods"
Then I tried this in "Sample Two.ods":

Code: Select all

=MATCH("/"&SUBSTITUTE($C3;" ";"")&"-";'file:///tmp/Sample One.ods'#$Sheet1.$G$1:$G$7;0)
Instead of the expected value 4 this formula returns the no match error #NA.
After copying the source column from "Sample One" to "Sample Two".Sheet2, the following formula does return the expected value 4.

Code: Select all

=MATCH("/"&SUBSTITUTE($C3;" ";"")&"-";$Sheet2.$G$1:$G$7;0)
If this is a bug, it affects OpenOffice4 and LibreOffice6.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Comparing two documents in Calc with different fomats

Post by John_Ha »

I told you someone (with better knowledge) would come along.

I only have a small brain and Villeroy's formula is too hard for me to get my mind round :super:

I prefer lots of small steps because even my little brain knows it does what I think it does.

Some other searches which work are below. I copied the data into a helper (coloured) column. I highlighted the data and then Edit > Find and Replace ..., in the Current Selection which searched only the highlighted data.

Code: Select all

(.*)( )(.*)
$1$3

([:digit:]{6})(/)([:alpha:][:digit:]{7})(.*)
$3
Incidentally, your business is breaking the most fundamental rule of data management which is "A piece of data should be stored in only one place". If the same piece of data is stored in more than one place they will ALWAYS become different over time and you will not know which is correct. Businesses should always use a database and not individual spreadsheets to manage the business. It's why Larry Ellisson is a multi-billionaire.
Clipboard01.gif
Attachments
Sample One jh.ods
(13.78 KiB) Downloaded 127 times
Sample Two jh.ods
(14.89 KiB) Downloaded 126 times
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Comparing two documents in Calc with different fomats

Post by Villeroy »

John_Ha wrote: Businesses should always use a database and not individual spreadsheets to manage the business. It's why Larry Ellisson is a multi-billionaire.
These data had been extracted from some kind of database anyway. I doubt that anyone typed all this into spreadsheets. As soon as your data leave the business database they become very difficult to compare, let alone when they come from different databases. Everything in one place and that place needs to be some company database.

My formula is just a simple MATCH with a link to a column in another file.
=MATCH(text ; column ; 0)
and the lookup text is a preceeding "/" the cell string without space and a trailing "-": "/B0711671-". The formula works fine when the search column is copied into the same document but fails with an external reference.
If the MATCH works, you get any other column value from the other file with =INDEX(other_column ; MATCH(text ; column ; 0))
Oh, and this does work:

Code: Select all

=INDEX('file:///tmp/Sample One.ods'#$Sheet1.$A$1:$A$7 ; MATCH("/"&SUBSTITUTE($C3;" ";"")&"-";$Sheet2.$G$1:$G$7;0))
It returns the value from the other file's column A where "/B0711671-" is matched within the search column that has been copied to this file.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
HalloCoffee
Posts: 4
Joined: Tue Mar 24, 2020 2:34 pm

Re: Comparing two documents in Calc with different fomats

Post by HalloCoffee »

Thanks for the answer. I can't get it to work, but I'll try and figure it out.

Villeroy wrote:I put 154557/B0711671-10062 in cell G4 of "Sample One.ods" and turned off "match whole cells" in the calculation options of "Sample Two.ods"
Then I tried this in "Sample Two.ods":

Code: Select all

=MATCH("/"&SUBSTITUTE($C3;" ";"")&"-";'file:///tmp/Sample One.ods'#$Sheet1.$G$1:$G$7;0)
Instead of the expected value 4 this formula returns the no match error #NA.
After copying the source column from "Sample One" to "Sample Two".Sheet2, the following formula does return the expected value 4.

Code: Select all

=MATCH("/"&SUBSTITUTE($C3;" ";"")&"-";$Sheet2.$G$1:$G$7;0)
If this is a bug, it affects OpenOffice4 and LibreOffice6.
OpenOffice 4.1.3 on Windows10
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Comparing two documents in Calc with different fomats

Post by Villeroy »

Save the 2 files to the same directory and open "Sample Two.ods"
Attachments
Sample Two.ods
(15.14 KiB) Downloaded 134 times
Sample One.ods
(18.11 KiB) Downloaded 139 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Post Reply