[Solved] SQL Query: Conditional to avoid overlap?
[Solved] SQL Query: Conditional to avoid overlap?
My apologies if this has already been asked, I searched but wasn't able to find anything relevant. Since I'm new to SQL I may not be using the right search terms. On to the task at hand...
I have a database of mercury emissions from European countries with about 13K records (all in a single table, 'Data'). Countries have reported their emissions according to each economic sector annually; sometimes under one program, sometimes under another, sometimes under both (creating some overlap).
The relevant columns are 'YEAR', 'COUNTRY', 'REPORT' (for the reporting program), 'SECTOR' and 'NUMBER' (the quantity of emissions for that country, from that sector, in that year).
I would like to query the database to return a list of all records from the first reporting program ('N09') and, when there is no record from that reporting program for a particular COUNTRY, SECTOR, and YEAR, I want to include the record from the second reporting program ('N02'). It's important, however, to avoid having multiple records with the same 'COUNTRY', 'YEAR' and 'SECTOR'. It seems clear that there needs to be a conditional in here (IF no N09, THEN N02), but since my SQL knowledge is quite limited, I'm not sure how to go about this. CASE WHEN, maybe? MINUS? Any ideas?
I hope that this is clear, please let me know if it isn't and I'll do my best to clarify or provide examples. Thanks in advance for your help!
I have a database of mercury emissions from European countries with about 13K records (all in a single table, 'Data'). Countries have reported their emissions according to each economic sector annually; sometimes under one program, sometimes under another, sometimes under both (creating some overlap).
The relevant columns are 'YEAR', 'COUNTRY', 'REPORT' (for the reporting program), 'SECTOR' and 'NUMBER' (the quantity of emissions for that country, from that sector, in that year).
I would like to query the database to return a list of all records from the first reporting program ('N09') and, when there is no record from that reporting program for a particular COUNTRY, SECTOR, and YEAR, I want to include the record from the second reporting program ('N02'). It's important, however, to avoid having multiple records with the same 'COUNTRY', 'YEAR' and 'SECTOR'. It seems clear that there needs to be a conditional in here (IF no N09, THEN N02), but since my SQL knowledge is quite limited, I'm not sure how to go about this. CASE WHEN, maybe? MINUS? Any ideas?
I hope that this is clear, please let me know if it isn't and I'll do my best to clarify or provide examples. Thanks in advance for your help!
Last edited by Hagar Delest on Wed Mar 31, 2010 9:47 pm, edited 1 time in total.
Reason: tagged [Solved].
Reason: tagged [Solved].
OOo 3.2 - Win XP Pro
Re: SQL Query: Conditional to avoid overlap?
Look for the key word DISTINCT in your SQL manual.
OOo 3.2.1 on ubuntu 10.4, MsWXP, MsW2k
Re: SQL Query: Conditional to avoid overlap?
Your challenge is that REPORT='N02 and REPORT='N09' are both at different rows. You can in those cases not work simple with an IF. The suggestion of DISTINCT is also not solving the issue for you if you want to see the REPORT value also in the query result.
When thinking about SQL try to consider sets of data. I see in your question two sets of data:
- A set of data based on filter REPORT='N09'
- A set of data based on filter REPORT='N02' in case REPORT='N09' does not exists.
I can give you a nice efficient query, but I think a more expensive query will be better understandable. The conditions for both sets of data are taken together by the 'or' in the where-clause of the query. You can do the following:Note the a and b above this are table-aliases to identify the same table with different roles in the query
When thinking about SQL try to consider sets of data. I see in your question two sets of data:
- A set of data based on filter REPORT='N09'
- A set of data based on filter REPORT='N02' in case REPORT='N09' does not exists.
I can give you a nice efficient query, but I think a more expensive query will be better understandable. The conditions for both sets of data are taken together by the 'or' in the where-clause of the query. You can do the following:
Code: Select all
select a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR" and a."NUMBER"
from "Data" a
where a."REPORT" = 'N09"
or a."REPORT"='N02' and not exists(select 1 from "Data" b where a."YEAR" = b."YEAR" and a."COUNTRY" = b."COUNTRY" and a."SECTOR" = b."SECTOR")It's Microsoft marketing that tells you computers are qualified for non-technicians
W11 22H2 (build 22621), LO 7.4.2.3(x64)
W11 22H2 (build 22621), LO 7.4.2.3(x64)
Re: SQL Query: Conditional to avoid overlap?
Thanks for the guidance, eremmel, but we're not quite there yet. Your understanding of the challenge (as evidenced in the above quote) is spot on, but unfortunately the code is crashing Base.eremmel wrote: When thinking about SQL try to consider sets of data. I see in your question two sets of data:
- A set of data based on filter REPORT='N09'
- A set of data based on filter REPORT='N02' in case REPORT='N09' does not exists.
Just to be precise, here is the code which I'm currently running:
Code: Select all
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
WHERE a."REPORT" = 'N09'
OR a."REPORT" = 'N02'
AND NOT EXISTS ( SELECT 1 FROM "Data" b WHERE a."YEAR" = b."YEAR" AND a."COUNTRY" = b."COUNTRY" AND a."SECTOR" = b."SECTOR" )Also, I'd be eager to see the more efficient version of the code in the hopes of increasing my (rather limited) knowledge.
OOo 3.2 - Win XP Pro
Re: SQL Query: Conditional to avoid overlap?
First I noticed that I missed on condition in the "SELECT 1 FROM ...", but that will unlikely effect the crashes. For a correct working query you should add to this select at the end: AND b."REPORT" = 'N09'. Because we need a "REPORT" = 'N02' when there is no "REPORT" = 'N09' for a YEAR/COUNTRY/SECTOR combination.
Do you mean by 'crash' that the Base application crashes and that you need to start the Base application?
I tested a similar construct as I proposed to you without any issues under OOo 3.1.1. I'll test later today on 3.2.
You might play with your query to learn what is breaking it:
Remove the 'OR ... AND NOT EXISTS ...' part of the query and test. Works? -> add the 'OR ...' part without 'AND NOT ...' and test. Works, than we need to think about it.
The other solution I was referring to might only work in Direct SQL mode (enable via menu: Edit->'Run SQL Command Directly'), at least this is true for 3.1.1. It retrieves both sets and clue them via a 'UNION ALL'. The 'NOT EXISTS' part is transformed into an OUTER JOIN (most professional databases do that automaticly to improve execution). Here is the code (again not validated)Note: You can add an one 'ORDER BY' at the end of the above statement.
Do you mean by 'crash' that the Base application crashes and that you need to start the Base application?
I tested a similar construct as I proposed to you without any issues under OOo 3.1.1. I'll test later today on 3.2.
You might play with your query to learn what is breaking it:
Remove the 'OR ... AND NOT EXISTS ...' part of the query and test. Works? -> add the 'OR ...' part without 'AND NOT ...' and test. Works, than we need to think about it.
The other solution I was referring to might only work in Direct SQL mode (enable via menu: Edit->'Run SQL Command Directly'), at least this is true for 3.1.1. It retrieves both sets and clue them via a 'UNION ALL'. The 'NOT EXISTS' part is transformed into an OUTER JOIN (most professional databases do that automaticly to improve execution). Here is the code (again not validated)
Code: Select all
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
WHERE a."REPORT" = 'N09'
UNION ALL
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
LEFT OUTER JOIN "Data" b
ON a."YEAR" = b."YEAR" AND a."COUNTRY" = b."COUNTRY" AND a."SECTOR" = b."SECTOR" AND b."REPORT" = 'N09'
WHERE a."REPORT" = 'N02'
AND b."YEAR" IS NULL| Edit: Testing with OOo 3.2 did worked fine as well |
It's Microsoft marketing that tells you computers are qualified for non-technicians
W11 22H2 (build 22621), LO 7.4.2.3(x64)
W11 22H2 (build 22621), LO 7.4.2.3(x64)
Re: SQL Query: Conditional to avoid overlap?
Thanks again for your response, eremmel. Regarding the crash, when I run the query, Base seems to hesitate for a few seconds before stopping to respond ('Not responding' in the title bar and in Windows Task Manager). If I look at the running processes, I see that soffice.bin is using about 50% of the processor resources. I click on the red X, Windows tells me that the program is not responding, I confirm that I want to close it, and it closes quickly. When I go to reopen, I see that the .odb.lck file is still in the directory, I open the original database file, OOo does its recovery operation, and I'm back into the program.
When I run the following reduced code, everything works fine:
It would seem that the problem starts with 'AND NOT EXISTS'.
Similarly, when I run the second set of code that you provided (with the 'UNION ALL') OOo crashes in the same way described above.
Thank you again for all of your help so far, do you have any further ideas on what may be causing this? I can provide you with the data if you'd like to try it yourself. Also, I'll try it out on my home computer this weekend to see if there is any difference.
Thanks!
When I run the following reduced code, everything works fine:
Code: Select all
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
WHERE a."REPORT" = 'N09'
OR a."REPORT" = 'N02' Similarly, when I run the second set of code that you provided (with the 'UNION ALL') OOo crashes in the same way described above.
Thank you again for all of your help so far, do you have any further ideas on what may be causing this? I can provide you with the data if you'd like to try it yourself. Also, I'll try it out on my home computer this weekend to see if there is any difference.
Thanks!
OOo 3.2 - Win XP Pro
Re: SQL Query: Conditional to avoid overlap?
I think that it is working hard to get you the data you ask for. I'm not sure how large your database is, but normally it should not take that long when a few thousand rows are in place.
If you have no issue with the content of your data you can post it here and I will have a look.
You might try to add an extra filter and look for data for only one year.
If you have no issue with the content of your data you can post it here and I will have a look.
You might try to add an extra filter and look for data for only one year.
It's Microsoft marketing that tells you computers are qualified for non-technicians
W11 22H2 (build 22621), LO 7.4.2.3(x64)
W11 22H2 (build 22621), LO 7.4.2.3(x64)
Re: SQL Query: Conditional to avoid overlap?
Thanks for the suggestions eremmel. I've now tried the database on my home computer (Windows 7, OOo 3.1) and the crashing problem persists.
Since the data is publicly available and the results will be published publicly, I don't have a problem sharing it. The only issue is that it is too large for the forum, you can download it from this link. If you could have a look, it would be greatly appreciated.
In the meantime, I'm going to try to get it into a MS Access (I know) database and see if it will work there.
Thanks!
Since the data is publicly available and the results will be published publicly, I don't have a problem sharing it. The only issue is that it is too large for the forum, you can download it from this link. If you could have a look, it would be greatly appreciated.
In the meantime, I'm going to try to get it into a MS Access (I know) database and see if it will work there.
Thanks!
OOo 3.2 - Win XP Pro
Re: SQL Query: Conditional to avoid overlap?
Thanks for sharing the database. I did some testing and found some interesting results.
Lets call the query with the 'not exists' part Query A (qA) and the query with the 'union all' and 'left join' Query B (qB).
Normally queries become faster when you give the database an smart access path to the table. So I created an index on the table on the column YEAR, COUNTRY, SECTOR, REPORT. This improved qB a lot: it ran in about 20 seconds (1). qA ran with index in 7 minutes (2). I removed the index and tested qB again. It took 19.5 minutes to complete (3). You might test qA without an index. I guess it will run in about the same time See graph I had never expected that the internal database is so slow on an left join on a 10k rows, but any way an index solves it. This is how you create an index:
1 open menu Tools->'SQL...'
2 type in the create index command in 'command to execute' and press 'Execute': After this you can not save your Base file, because it does not 'register' this action as a change, so change something to a query, save the query and your Base document.
This is the query I used to get all the numbers (slight improved one as posted before):
Lets call the query with the 'not exists' part Query A (qA) and the query with the 'union all' and 'left join' Query B (qB).
Normally queries become faster when you give the database an smart access path to the table. So I created an index on the table on the column YEAR, COUNTRY, SECTOR, REPORT. This improved qB a lot: it ran in about 20 seconds (1). qA ran with index in 7 minutes (2). I removed the index and tested qB again. It took 19.5 minutes to complete (3). You might test qA without an index. I guess it will run in about the same time See graph I had never expected that the internal database is so slow on an left join on a 10k rows, but any way an index solves it. This is how you create an index:
1 open menu Tools->'SQL...'
2 type in the create index command in 'command to execute' and press 'Execute':
Code: Select all
create index REPORT on "Data" ("YEAR", COUNTRY, SECTOR, REPORT)This is the query I used to get all the numbers (slight improved one as posted before):
Code: Select all
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
WHERE a."REPORT" = 'N09'
UNION ALL
SELECT a."YEAR", a."COUNTRY", a."REPORT", a."SECTOR", a."NUMBER"
FROM "Data" a
LEFT OUTER JOIN "Data" b
ON a."YEAR" = b."YEAR" AND a."COUNTRY" = b."COUNTRY" AND a."SECTOR" = b."SECTOR"
AND a."REPORT" = 'N02' AND b."REPORT" = 'N09'
WHERE b."REPORT" IS NULL
ORDER BY 1,2,3,4| Edit: When I take the following actions: 1 add an extra integer column YCS to the table Data, 2 give this column for each YEAR, COUNTRY, SECTOR a unique number 3 make an index on YCS, REPORT 4 change the ON clause of qB and replace YEAR,COUNTRY, SECTOR with YCS then the query runs in 2 seconds. This shows the importance of simple relations on join conditions as well next to the right index |
It's Microsoft marketing that tells you computers are qualified for non-technicians
W11 22H2 (build 22621), LO 7.4.2.3(x64)
W11 22H2 (build 22621), LO 7.4.2.3(x64)
Re: SQL Query: Conditional to avoid overlap?
Aha! Fascinating results, it shows how much I don't know about databases (but it makes perfect sense). I feel like I owe you not only for your time but also for the kilowatt-hours that your processor burnt up!
I'm going to plug it in right now and give it a try. Many, many thanks!
I'm going to plug it in right now and give it a try. Many, many thanks!
OOo 3.2 - Win XP Pro
[Solved] SQL Query: Conditional to avoid overlap?
It worked!
What beautiful data! Thank you, eremmel, for all of your hard work!
It is greatly appreciated! 
OOo 3.2 - Win XP Pro
Re: [Solved] SQL Query: Conditional to avoid overlap?
I have a similar issue but I can't seem to find the right solution even after reviewing this thread.
I have a table ("Billing") that contains records of therapy sessions for billing insurance companies. The columns are "Pt-ID", "Clinician", "Code", "Date", "Start Time", "End Time". My problem is that sometimes the therapists report their times "in error" causing overlaps. I am trying to create a query that catches those overlaps.
To make matters more complicated, there are two "overlap-scenarios".
The first: If one "Clinician" reported session-times that overlap, even by one minute, with another session-time reported by that same "Clinician". ------Looking for: 1 Clinician, 1 Date, 2+ Records with overlapping time-frames.
The second: If two Clinicians reported session-times that overlap, even by one minute, on the same Patient ("Pt-ID")
------Looking for: 2+ Clinicians, 1 Pt-Id, 1 Date, 2+ Records with overlapping time-frames.
Do any possible structures come to mind that would return these results? I would greatly appreciate the help.
I have a table ("Billing") that contains records of therapy sessions for billing insurance companies. The columns are "Pt-ID", "Clinician", "Code", "Date", "Start Time", "End Time". My problem is that sometimes the therapists report their times "in error" causing overlaps. I am trying to create a query that catches those overlaps.
To make matters more complicated, there are two "overlap-scenarios".
The first: If one "Clinician" reported session-times that overlap, even by one minute, with another session-time reported by that same "Clinician". ------Looking for: 1 Clinician, 1 Date, 2+ Records with overlapping time-frames.
The second: If two Clinicians reported session-times that overlap, even by one minute, on the same Patient ("Pt-ID")
------Looking for: 2+ Clinicians, 1 Pt-Id, 1 Date, 2+ Records with overlapping time-frames.
Do any possible structures come to mind that would return these results? I would greatly appreciate the help.
OpenOffice 3.2 on Windows 7