Remove duplicates from Calc
Remove duplicates from Calc
What would be the recommended way to remove duplicate entries from a column in Calc? In Word 2007 there is a simple 'Remove Duplicates' menu item; is there anything like this in Calc? If I recall, there was a way to do it in Calc but it was a bit more involved.
Thanks,
Todd
Thanks,
Todd
Re: Remove duplicates from calc
Select the entire range containing data to filter, then click on the menu Data > Filter > Standard Filter and:
1. Use a condition that is always TRUE, like field1 = Not empty
2. Click on the button more, select Remove Duplicate, select Copy to and put the address of an empty cell
The whole range (without duplicate) will be analyzed and copied at that new address.
1. Use a condition that is always TRUE, like field1 = Not empty
2. Click on the button more, select Remove Duplicate, select Copy to and put the address of an empty cell
The whole range (without duplicate) will be analyzed and copied at that new address.
LibreOffice 4.2.3.3. on Ubuntu 14.04
Re: Remove duplicates from calc
Hi squenson,
thanks for this explanation. Can you give me an idea how to keep the removed rows (since I have other columns with data in them)?
TIA
Zsolt
thanks for this explanation. Can you give me an idea how to keep the removed rows (since I have other columns with data in them)?
TIA
Zsolt
OOo 2.4.X on MS Windows Vista
Re: Remove duplicates from calc
I would use two new columns:
1. In the first one, put a sequential number starting from 1 in the first row, and drag it down (you get 2, 3, etc. in this column)
2. Sort the range according to the column that will eventually contain duplicate value
3. In the second new column, put a formula that tests the value if the same as the previous row. If yes, put 1, if not, put 0. Example: =IF(A3=A2;1;0). Copy the formula down the column up to the last row. Rows with a 1 are duplicate rows.
4. You can now use the filtered list technique mentioned in my previous post to select only these rows and copy them somewhere. Don't use, in that case, the remove duplicate as all rows you want to select have a 1 in the specified column.
Last, if you want to sort back your range to the initial state, select the whole range and sort it according to the sequential numbers you have created in the first step.
1. In the first one, put a sequential number starting from 1 in the first row, and drag it down (you get 2, 3, etc. in this column)
2. Sort the range according to the column that will eventually contain duplicate value
3. In the second new column, put a formula that tests the value if the same as the previous row. If yes, put 1, if not, put 0. Example: =IF(A3=A2;1;0). Copy the formula down the column up to the last row. Rows with a 1 are duplicate rows.
4. You can now use the filtered list technique mentioned in my previous post to select only these rows and copy them somewhere. Don't use, in that case, the remove duplicate as all rows you want to select have a 1 in the specified column.
Last, if you want to sort back your range to the initial state, select the whole range and sort it according to the sequential numbers you have created in the first step.
LibreOffice 4.2.3.3. on Ubuntu 14.04
Re: Remove duplicates from calc
Although this method seems to be very intuitive, somehow it doesn't seem to filter out the duplicates.squenson wrote:Select the entire range containing data to filter, then click on the menu Data > Filter > Standard Filter and:
1. Use a condition that is always TRUE, like field1 = Not empty
2. Click on the button more, select Remove Duplicate, select Copy to and put the address of an empty cell
The whole range (without duplicate) will be analyzed and copied at that new address.
After fumbling with it for 10-15 minutes, I ended up using the 2-steps method suggested by squenson (the IF function, then filter the 0's)
I don't know what went wrong with the 1st method.
I have 3 columns of data, here are the examples:
A | B | C
CN.S08 AXLE AXLE FRONT BALANCER
CN.J13 AXLE AXLE FRONT KNUCKLE
CN.X02 AXLE AXLE FRONT NUT
CN.C15 AXLE AXLE REAR COMPLETE
CN.C11 AXLE AXLE SHAFT
CN.L08 AXLE AXLE SHAFT
CN.N04 AXLE AXLE SHAFT
Objective is to filter out duplicate datas in column C (axle shaft), ie only the row containing CN.C11 AXLE AXLE SHAFT will appear once, instead of multiple times.
I guess the filter thinks that although row CN.C11 & CN.L08 contain the same value in column C (axle shaft), but value in column A (CN.C11, CNL08) is different. Thus it spits out the same value as before.
Henry
OOo 2.4.X on Ms Windows XP
Re: Remove duplicates from calc
First question when removing duplicates: What makes two records duplicates?
Two equal values in field "Family Name"? Certainly not.
Two equal combinations in fields "Family Name" AND "Forname"? May be.
Two equal combinations in fields "Family Name" AND "Forname" AND "Bith Date"? Not necessarily.
In case of text values, case sensitively?
Second question when removing duplicates: Which one is the duplicate to keep?
Doe John 1941-12-31 Carpenter [added some years ago]
Doe John 1941-12-31 Retired [added recently]
Doe John 1981-12-31 Student [added some years ago]
Doe John 1981-12-31 Dr. med. [added more recently]
...obviously the 4 records refer to the same 2 persons, but no software can decide which one is to be removed.
Two equal values in field "Family Name"? Certainly not.
Two equal combinations in fields "Family Name" AND "Forname"? May be.
Two equal combinations in fields "Family Name" AND "Forname" AND "Bith Date"? Not necessarily.
In case of text values, case sensitively?
Second question when removing duplicates: Which one is the duplicate to keep?
Doe John 1941-12-31 Carpenter [added some years ago]
Doe John 1941-12-31 Retired [added recently]
Doe John 1981-12-31 Student [added some years ago]
Doe John 1981-12-31 Dr. med. [added more recently]
...obviously the 4 records refer to the same 2 persons, but no software can decide which one is to be removed.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from calc
It depends on the data but if you're cleaning up a dirty data set with duplicates, weird outliners, etc you' start to recognize them.
I'm not sure that I'd want to use Calc to do the work but it's a common problem.
I'm not sure that I'd want to use Calc to do the work but it's a common problem.
LibreOffice 7.3.7. 2; Ubuntu 22.04
Re: Remove duplicates from calc
Anyway,
... returns the frequency of combinations in this row's A, B and C within rows 1 to 99.
Adjust to your needs, copy down and filter this column by ">1".
Then decide about the records to stay.
Code: Select all
=SUMPRODUCT($A1=$A$1:$A$99;$B1=$B$1:$B$99;$C1=$C$1:$C$99)
Adjust to your needs, copy down and filter this column by ">1".
Then decide about the records to stay.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Remove/recalcalculate duplicates in multiple rows/columns
Find and automatically recalculate unique text values or remove duplicate values in multiple rows or columns.
Uses if and countif functions.
Example spreadsheet illustrates:
1) multiple rows of text
2) duplicate values in cells of each row
3) desire to strip out duplicate values in a given row.
Uses vvjoin macro for concatenation but this is not absolutely necessary.
Basic concept may be modified easily to deal with columns and/or to present results in different ways.
Uses if and countif functions.
Example spreadsheet illustrates:
1) multiple rows of text
2) duplicate values in cells of each row
3) desire to strip out duplicate values in a given row.
Uses vvjoin macro for concatenation but this is not absolutely necessary.
Basic concept may be modified easily to deal with columns and/or to present results in different ways.
- Attachments
-
- remove duplicate example.ods
- (14.97 KiB) Downloaded 7413 times
Re: Remove duplicates from calc
Sorry to bring up this old topic, is there anyway while using the standard filter to paste special instead of copying the cells. In the current spreadsheet the data i am meant to filter out is created by formulas within in the cells. If there are any other way of doing it please mention
Many Thanks
kehs
Many Thanks
kehs
OpenOffice 3.1 on Windows
Re: Remove duplicates from calc
Database, database, database. Always use a database. Never ever use a spreadsheet if you want to organize your data tidily.kehs wrote:Sorry to bring up this old topic, is there anyway while using the standard filter to paste special instead of copying the cells. In the current spreadsheet the data i am meant to filter out is created by formulas within in the cells. If there are any other way of doing it please mention
Many Thanks
kehs
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from calc
Hahaha will learn that, but as of right now is there anyway to paste special with the standard filter?
OpenOffice 3.1 on Windows
Re: Remove duplicates from calc
Your question is rather unclear. I believe you want to filter to another range (with an output target in the standard filter's extra options) and want to copy values only rather than formulas.
No, there is no way to do this other than using a database or writing sophisticated macro stuff.
No, there is no way to do this other than using a database or writing sophisticated macro stuff.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from calc
Stop, wait a minute!
Filter in place, copy and paste-special numbers, text, times and anything else but no formulas.
Is this what you want?
Filter in place, copy and paste-special numbers, text, times and anything else but no formulas.
Is this what you want?
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from calc
Coloumn A: This would consist of the dates copied from a database.
Column B: Would contain formula modifying the dates in coloumn A to give a speicifc format and layout.
What i am trying to accomplish is remove any duplicates in Column B and count the total number of dates LEFT once the duplicates are removed.
Anyhelp would be appriciated
Column B: Would contain formula modifying the dates in coloumn A to give a speicifc format and layout.
What i am trying to accomplish is remove any duplicates in Column B and count the total number of dates LEFT once the duplicates are removed.
Anyhelp would be appriciated
OpenOffice 3.1 on Windows
Re: Remove duplicates from calc
You copy data from a database into a spreadsheet and wonder why you lose most of the database functionality.Coloumn A: This would consist of the dates copied from a database.
In the database(should work with any database):
SELECT DISTINCT "Dates" FROM "Tables"
SELECT COUNT(*) FROM (SELECT DISTINCT "Dates" FROM "Tables")
Apply your "specific number format" to A?Column B: Would contain formula modifying the dates in coloumn A to give a speicifc format and layout.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
- GopherBaroque
- Posts: 2
- Joined: Fri Oct 01, 2010 12:14 am
Data-Filter is a great way to remove duplicates from calc
Hello, I want to thank squenson for the initial solution:
which worked great on my little list. Acknowledging that a spreadsheet is not really a database, I had a long list of items that were provided in a spreadsheet, with a small number of unique entries arranged in a large number of combinations. It was easy to cut and paste cells to make a single long column of all intances and the above technique was an efficient one-shot filter that quickly gave me the list of unique entries to copy over as input to my analysis of the combinations.Select the entire range containing data to filter, then click on the menu Data > Filter > Standard Filter and:
1. Use a condition that is always TRUE, like field1 = Not empty
2. Click on the button more, select Remove Duplicate, select Copy to and put the address of an empty cell
The whole range (without duplicate) will be analyzed and copied at that new address.
OpenOffice 3.1 on Windows XP, 7
Re: Remove duplicates from calc
is possible copy the result to clipboard with one blank row before all row similar of this
cells whit duplicates
1
2
3
1
4
5
6
cell copied
1
2
3
4
5
6
some one please help me
sorry for my english
cells whit duplicates
1
2
3
1
4
5
6
cell copied
1
2
3
4
5
6
some one please help me
sorry for my english
OpenOffice 3.1 on Windows XP
- kingfisher
- Volunteer
- Posts: 2123
- Joined: Tue Nov 20, 2007 10:53 am
Re: Remove duplicates from calc
You can use the menu selection Data > Filter > Standard to copy the range without duplicates to another location (say the next empty column).
Example attached.
Example attached.
Apache OpenOffice 4.1.9 on Linux
Re: Remove duplicates from Calc
5 years later, squenson's generous help is still helping people. I'm particularly impressed by the reply at http://forum.openoffice.org/en/forum/vi ... 379#p35891. Brilliant! I registered just to register my thanks.
LibO 5.1.1.3 (Downgraded from 5.2.2.2 due to Impress crashing upon any animation)
Re: Remove duplicates from Calc
OK, there are some pretty elegant solutions to the matter, but mine is a little different than the original problem of the topic! I know it's an old one but still I'm posting my request for help, so I beg you be generous
So the situation is like this: I've got a 5 column data set A:E containing:
A:C are: month, day and hour. D is "some string" and E is "someone's name" - they all look like this:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| from=<George>|
Jan | 3 |15:59:28|3ef56| to=<Peter> |
... and it goes like this for like say 1000 rows. So obviously the data has to be sorted by month, day, hour and some number in column D. The thing is that D values need to come in couples in every two sequential rows and for every data set I know that the E value for one of the duplicate D value rows must be lets say "George" for the sake of this example. After sorting the data sometimes there is a third row containing the same value in D column, like so:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| from=<don't_even_ask>|
Jan | 3 |16:52:31|33f56| from=<George> |
Jan | 3 |16:53:20|33f56| to=<Peter> |
.... and this is a problem. In this particular example is clear that the first row should go. Obviously Calc is not capable of making this decision, but it sure can point out when there are more than two occurrences of the value in D. The best outcome would be if this is the case all the rows to be say colored in RED. I don' even know if that is possible, it was just a thought. It would help identifying the case and remove the rows by hand.
Also, sometimes there is this case:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| to=<George>|
... and there is no row with the D value matching the one here - the D value is unique for the entire data set. Those rows I'd like to remove immediately.
Any ideas you OO wizards?
I would appreciate all the help I can get
So the situation is like this: I've got a 5 column data set A:E containing:
A:C are: month, day and hour. D is "some string" and E is "someone's name" - they all look like this:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| from=<George>|
Jan | 3 |15:59:28|3ef56| to=<Peter> |
... and it goes like this for like say 1000 rows. So obviously the data has to be sorted by month, day, hour and some number in column D. The thing is that D values need to come in couples in every two sequential rows and for every data set I know that the E value for one of the duplicate D value rows must be lets say "George" for the sake of this example. After sorting the data sometimes there is a third row containing the same value in D column, like so:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| from=<don't_even_ask>|
Jan | 3 |16:52:31|33f56| from=<George> |
Jan | 3 |16:53:20|33f56| to=<Peter> |
.... and this is a problem. In this particular example is clear that the first row should go. Obviously Calc is not capable of making this decision, but it sure can point out when there are more than two occurrences of the value in D. The best outcome would be if this is the case all the rows to be say colored in RED. I don' even know if that is possible, it was just a thought. It would help identifying the case and remove the rows by hand.
Also, sometimes there is this case:
A | B | C | D | E |
Jan | 3 |15:58:32|3ef56| to=<George>|
... and there is no row with the D value matching the one here - the D value is unique for the entire data set. Those rows I'd like to remove immediately.
Any ideas you OO wizards?
I would appreciate all the help I can get
OpenOffice.org 3.4.1 on Windows XP Pro and on Windows 7
Re: Remove duplicates from Calc
Jan | 3 |15:58:32|3ef56| from=<George>| to=<Peter>
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from Calc
What exactly do you mean by that?Villeroy wrote:Jan | 3 |15:58:32|3ef56| from=<George>| to=<Peter>
OpenOffice.org 3.4.1 on Windows XP Pro and on Windows 7
Re: Remove duplicates from Calc
Your data are redundant. So you have pairs of records where the same information may fit in one record. If the one hour difference between "from" and "to" matters, the "from" and the "to" records belong into separate tables so you can look up the "from" for each "to" (or vice versa).
Apart from that, it is a huge mistake to do such things in a spreadsheet. A database would be by far less error prone, much more convenient to maintain but more difficult to set up.
Apart from that, it is a huge mistake to do such things in a spreadsheet. A database would be by far less error prone, much more convenient to maintain but more difficult to set up.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from Calc
Yes I've thought about merging the records before but then the data does not answer the purpose, so yes the hour difference matters and the data format should stay as shown.Villeroy wrote:Your data are redundant. So you have pairs of records where the same information may fit in one record. If the one hour difference between "from" and "to" matters, the "from" and the "to" records belong into separate tables so you can look up the "from" for each "to" (or vice versa).
And yes, if the database still existed a relatively simple (or maybe not that simple ) request would be sufficient to do the job. Unfortunately what I've showed here is an shredded excerpt(there is more garbage to it that need to be removed before it gets to this state) from logs of a none existing database. So I don't have much to choose from, I have to do the job with what I've got.Villeroy wrote: Apart from that, it is a huge mistake to do such things in a spreadsheet. A database would be by far less error prone, much more convenient to maintain but more difficult to set up.
So I sense the answer is somewhere in that part of your reply:
Villeroy wrote:....the "from" and the "to" records belong into separate tables so you can look up the "from" for each "to" (or vice versa).
... but could you be more specific about it please?
OpenOffice.org 3.4.1 on Windows XP Pro and on Windows 7
Re: Remove duplicates from Calc
I have tried this:
=IF(COUNTIF($D$1:$D$748;D1)>1;"Duplicate";""
and it finds all the rows that do have duplicate value in D so in my case the ones that don't are not marked with "Duplicate" and are easy to spot. So this is good, but is there a way to delete those rows right away and not by hand?
********************************
After that I tried this one
=IF(COUNTIF($D$1:$D$748;D1)>2;"Duplicate";""
....and it does the job to find if there are more than one match for the current checked value. And of course marks them with "Duplicate" in the formula column. So this is also good start, but it would be even better if there is a way to make Calc to mark the whole row in color for example. Does anyone knows if this can be done?
Thanks to Villeroy who helped me on my other topic too! Cheers buddy
=IF(COUNTIF($D$1:$D$748;D1)>1;"Duplicate";""
and it finds all the rows that do have duplicate value in D so in my case the ones that don't are not marked with "Duplicate" and are easy to spot. So this is good, but is there a way to delete those rows right away and not by hand?
********************************
After that I tried this one
=IF(COUNTIF($D$1:$D$748;D1)>2;"Duplicate";""
....and it does the job to find if there are more than one match for the current checked value. And of course marks them with "Duplicate" in the formula column. So this is also good start, but it would be even better if there is a way to make Calc to mark the whole row in color for example. Does anyone knows if this can be done?
Thanks to Villeroy who helped me on my other topic too! Cheers buddy
OpenOffice.org 3.4.1 on Windows XP Pro and on Windows 7
Re: Remove duplicates from Calc
That feature is called "conditional formatting".
With any cell in row 1 being the active cell:
menu:Format>Conditional ...
Condition #1
<Formula Is>
COUNTIF($D$1:$D$748;$D2)>1
and assign some colored cell style.
For many shades of color you can put style names in a list, say X1:X5 and use:
Condition #1
<Formula Is>
STYLE(INDEX($X$1:$X$5;COUNTIF($D$1:$D$748;$D2)))
If D2 occurs 3 times in D1:D748, it reads the 3rd style name from list X1:X5 and function STYLE will apply this style.
With any cell in row 1 being the active cell:
menu:Format>Conditional ...
Condition #1
<Formula Is>
COUNTIF($D$1:$D$748;$D2)>1
and assign some colored cell style.
For many shades of color you can put style names in a list, say X1:X5 and use:
Condition #1
<Formula Is>
STYLE(INDEX($X$1:$X$5;COUNTIF($D$1:$D$748;$D2)))
If D2 occurs 3 times in D1:D748, it reads the 3rd style name from list X1:X5 and function STYLE will apply this style.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Remove duplicates from Calc
This one works fine unfortunately it affects the active cell only. And I can not expand this conditional formatting for all the cells in the column. If I select all the cells in the column the conditional formatting does not affect none of them.Villeroy wrote: With any cell in row 1 being the active cell:
menu:Format>Conditional ...
Condition #1
<Formula Is>
COUNTIF($D$1:$D$748;$D2)>1
and assign some colored cell style.
OpenOffice.org 3.4.1 on Windows XP Pro and on Windows 7
Re: Remove duplicates from Calc
It works with all the cells you have selected and you can copy/paste-special the formatting.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice