[Solved] Deleting duplicate rows
-
christian09
- Posts: 20
- Joined: Mon Dec 27, 2010 8:06 pm
[Solved] Deleting duplicate rows
I have a calc sheet w/ 600k rows of data
ie. Name, Address, phone, ID number as columns
I want to delete all duplicate rows. Many of the entries want to keep have Names muliple entries w/ different addresses and id's.
I tried =if (a1=a2;0;1) but I need to ID the complete row and don't know how to do it...tried =if(a1:d1=a2:d2;0;1) w/ and without parentheses, but can't get it.
Thanks
ie. Name, Address, phone, ID number as columns
I want to delete all duplicate rows. Many of the entries want to keep have Names muliple entries w/ different addresses and id's.
I tried =if (a1=a2;0;1) but I need to ID the complete row and don't know how to do it...tried =if(a1:d1=a2:d2;0;1) w/ and without parentheses, but can't get it.
Thanks
Last edited by christian09 on Sun Jan 09, 2011 6:11 pm, edited 1 time in total.
OpenOffice 3.2 on Windows Vista
- Robert Tucker
- Volunteer
- Posts: 1250
- Joined: Mon Oct 08, 2007 1:34 am
- Location: Manchester UK
Re: deleting duplicate rows
LibreOffice 7.x.x on Arch and Fedora.
-
christian09
- Posts: 20
- Joined: Mon Dec 27, 2010 8:06 pm
Re: deleting duplicate rows
Thanks, but this one finds duplicates in 1 column only. I have rows with 5 columns. My duplicates must match in Each column.
ie. if my columns are: Name, address, zip, id#
I may have a record w/ the same name 5 times, but the address will be different, or the zip diff etc.
thanks
OpenOffice 3.2 on Windows Vista
-
ken johnson
- Volunteer
- Posts: 918
- Joined: Sun May 31, 2009 1:35 am
- Location: Sydney, Australia
Re: deleting duplicate rows
Concatenate the 5 columns into 1 then use that column to find the duplicates.
Something like...
Ken Johnson
Something like...
Code: Select all
=A1&","&B1&","&C1&","&D1&","&E1AOO 4.1.3 on Ms Windows 10
Help improve this forum by clicking the EDIT button on your first post then adding "[Solved]" to its heading once a solution to your problem has been provided.
Help improve this forum by clicking the EDIT button on your first post then adding "[Solved]" to its heading once a solution to your problem has been provided.
-
christian09
- Posts: 20
- Joined: Mon Dec 27, 2010 8:06 pm
Re: deleting duplicate rows
thanks Ken, I almost have it...here's what I have so far.ken johnson wrote:Concatenate the 5 columns into 1 then use that column to find the duplicates.
Something like...Ken JohnsonCode: Select all
=A1&","&B1&","&C1&","&D1&","&E1
=IF (A1&","&B1&","&C1&","&D1&","&E1&","&F1&","&H1=A2&","&B2&","&C2&","&D2&","&E2&","&F2&","&H2;1;0)
Its supposed to give me a 1 for a duplicate(works on 1 column) but I can't make it give me a 1 using a the whole row(represented by A,B,C,D,E,F,H)
thanks
Steve
OpenOffice 3.2 on Windows Vista
Re: Deleting duplicate rows
You are making it too hard for yourself.
First concatenate the values of the 7 separate columns into 1 column ( for instance column K) then apply the duplicate search on that column.
So it will be code for column K (edited: I noticed you used 7 columns including H):
Copy down until you reach the last value in column A (code should update automagicly the cellreferences)
(it doesn't really matter if you can read it or not, the soul purpose of this column is to determine the duplicates)
Now to look for the duplicates.
The code for the column next to K:
Now copy this code down until you reach the last row with values in A
(code should update automagicly the cellreferences)
This will put "Dup" on the row that is a duplicate of the above.
To retain this values you need to copy this whole column and Paste Special it on the same spot with only "Values" selected. (This will remove the formula but retain the value in the column.)
Select all and sort on the column with "Dup" as values
That will place them together and then you can easily remove them
First concatenate the values of the 7 separate columns into 1 column ( for instance column K) then apply the duplicate search on that column.
So it will be code for column K (edited: I noticed you used 7 columns including H):
Code: Select all
=A1&B1&C1&D1&E1&F1&H1(it doesn't really matter if you can read it or not, the soul purpose of this column is to determine the duplicates)
Now to look for the duplicates.
The code for the column next to K:
Code: Select all
=IF(K1=K2;"Dup";"")(code should update automagicly the cellreferences)
This will put "Dup" on the row that is a duplicate of the above.
To retain this values you need to copy this whole column and Paste Special it on the same spot with only "Values" selected. (This will remove the formula but retain the value in the column.)
Select all and sort on the column with "Dup" as values
That will place them together and then you can easily remove them
____________
DiGro
AOO 4.1.16 (Dutch) on Windows 11 64-bit. Scanned with Ziggo Safe Online (F-Secure)
DiGro
AOO 4.1.16 (Dutch) on Windows 11 64-bit. Scanned with Ziggo Safe Online (F-Secure)
-
ken johnson
- Volunteer
- Posts: 918
- Joined: Sun May 31, 2009 1:35 am
- Location: Sydney, Australia
Re: Deleting duplicate rows
When you concatenate the different columns you should include the comma separator.
Say a cell has "ab" and the next cell in that row has "cd", "ab"&"cd"="abcd", while "ab"&","&"cd"="ab,cd"
Say on a different row in the same two columns there are, respectively "a" and "bcd", "a"&"bcd"="abcd", while "a"&","&"bcd"="a,bcd".
If these two rows are identical in all the other columns they will be incorrectly tagged as duplicates when the comma separator is not used.
After you have set up the column with the comma separated column values you need to select all the data and the column with the concatenated values then sort by that column because =IF(K1=K2;"Dup";"") can only pick up duplicates when they are in adjacent rows.
If sorting is a hassle you can use a different formula...in row 1 filled down as far as needed.
Ken Johnson
Say a cell has "ab" and the next cell in that row has "cd", "ab"&"cd"="abcd", while "ab"&","&"cd"="ab,cd"
Say on a different row in the same two columns there are, respectively "a" and "bcd", "a"&"bcd"="abcd", while "a"&","&"bcd"="a,bcd".
If these two rows are identical in all the other columns they will be incorrectly tagged as duplicates when the comma separator is not used.
After you have set up the column with the comma separated column values you need to select all the data and the column with the concatenated values then sort by that column because =IF(K1=K2;"Dup";"") can only pick up duplicates when they are in adjacent rows.
If sorting is a hassle you can use a different formula...
Code: Select all
=IF(COUNTIF(K$1:K1;K1)>1;"Duplicate";"")Ken Johnson
AOO 4.1.3 on Ms Windows 10
Help improve this forum by clicking the EDIT button on your first post then adding "[Solved]" to its heading once a solution to your problem has been provided.
Help improve this forum by clicking the EDIT button on your first post then adding "[Solved]" to its heading once a solution to your problem has been provided.
-
christian09
- Posts: 20
- Joined: Mon Dec 27, 2010 8:06 pm
Re: Deleting duplicate rows[solved]
Thank you both for your answers..I was making it hard on myself.
OpenOffice 3.2 on Windows Vista
-
christian09
- Posts: 20
- Joined: Mon Dec 27, 2010 8:06 pm
Re: Deleting duplicate rows[solved]
I really see alot of potential in using these formulas...I am new to calc....I would like to learn the logic of this language(the name which I do not know).....any suggestions where to look?thanks
OpenOffice 3.2 on Windows Vista
Re: [Solved] Deleting duplicate rows
First place to look is the Help of OOo.
You will find the commands as well as examples with it.
Besides that: look around in the user forums and don't hesitate to ask.
We probably all started that way
You will find the commands as well as examples with it.
Besides that: look around in the user forums and don't hesitate to ask.
We probably all started that way
____________
DiGro
AOO 4.1.16 (Dutch) on Windows 11 64-bit. Scanned with Ziggo Safe Online (F-Secure)
DiGro
AOO 4.1.16 (Dutch) on Windows 11 64-bit. Scanned with Ziggo Safe Online (F-Secure)