SQL Command for finding duplicates in 4 different tables

dBase, Calc, CSV, MS ACCESS, MySQL, PostgrSQL, OTHER
Post Reply
olga
Posts: 4
Joined: Mon Nov 29, 2010 6:18 pm

SQL Command for finding duplicates in 4 different tables

Post by olga »

Hello,

I need help with the following issue:

I want to remove duplicates emails from 4 different tables ( All 4 tables have the same columns: category, compnay's name, email address and website).What is the SQL command for this action, is there a way of deleting the duplicates automatically?

I was using this for checking the duplicates in 2 tables:
SELECT "Email" FROM "Table1", "Table2" WHERE "Table1"."Email" = "Table2"."Email"

it does notwork for 4 tables...Please help me!
Thank you very much,$

Olga
Open Office 3.2, XP
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: SQL Command for finding duplicates in 4 different tables

Post by rudolfo »

The SQL language has an operation UNION that does exactly this. Builds a collection of records from multiple tables but ignores any 2nd or 3rd appearance of the same record.

Code: Select all

SELECT category, compnay_name, email_address, website FROM table_1
UNION
SELECT category, compnay_name, email_address, website FROM table_2
UNION
SELECT category, compnay_name, email_address, website FROM table_3
UNION
SELECT category, compnay_name, email_address, website FROM table_4
Leave the individual tables alone and access the data through a query that uses this kind of statement.

If you are really eager to use as less storage as possible, you can find duplicates with the following query:

Code: Select all

SELECT category, compnay_name, email_address, website, count(*)
  FROM (SELECT category, compnay_name, email_address, website FROM table_1
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_2
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_3
        UNION ALL
        SELECT category, compnay_name, email_address, website FROM table_4)
 GROUP BY category, compnay_name, email_address, website
 HAVING count(*) > 1
First you build a collection of all the records in all 4 tables (the ALL with the UNION means not to filter any duplicates) then you see which combination of "category, compnay_name, email_address, website" appears more then only one time in this overall collection.
This gives you an overview but unless you process this with only 3 tables or only 2 tables you won't be able to say, if the duplicate barney@exmaple.org is in table_1 and table_2 or if it is in table_2 and table_4.

Instead of cleaning you would rather like to build a new table with the combined/merged data of the 4 existing ones.

Code: Select all

CREATE TABLE good_table AS
SELECT category, compnay_name, email_address, website FROM table_1
UNION
 ...
UNION
SELECT category, compnay_name, email_address, website FROM table_4
Of course if you have several other fields in your tables as well, the data in that tables would be lost this way.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
Post Reply