Finding and updating records in two tables

Discuss the database features
Post Reply
pntos
Posts: 4
Joined: Thu May 22, 2014 7:28 am

Finding and updating records in two tables

Post by pntos »

I have a Customer database in OpenOffice 4.1 It states HSQL database engine. The main table "CustomerMaster" contains all the customers contact details with about 9 fields. I have another "Unsent" table which has a list of email addresses in a "EMail" field that were no longer valid for some customers. I want to compare the two tables "EMail" fields, and if equal, update the "CustomerMaster" "EMail" field with a 'Rejected' text string.

I am new with SQL only managed to find the dulplicates between the two tables with the following

SELECT "CompanyMaster".*, "Unsent"."EMail" FROM "CompanyMaster", "Unsent" WHERE "CompanyMaster"."EMail" = "Unsent"."EMail"

Whenever I use the UPDATE statement, it always gives me a syntax error. Had been trying to find answers for the last one week on the net, but to no avail. I think this is a simple operation, but just dont know where goes wrong.

Alternatively, if updating a HSQL database is not possible, how do I just compare the two tables and just display the records in the "CustomerMaster" table that are 'not equal' to the "EMail" in the "Unsent" table?

I also tried using the SELECT DISTINCT "CustomerMaster".* statement but it gives me a 'OutOfMemoryError for java heap space' error. My CustomerMaster table has 150,000 records while the Unsent table has 10,000 records of email that I need to remove.

Hope someone can kindly give me some pointers. Thank you
Apache OpenOffice 4.1 Non Embedded HSQL database
User avatar
MTP
Volunteer
Posts: 1620
Joined: Mon Sep 10, 2012 7:31 pm
Location: Midwest USA

Re: Finding and updating records in two tables

Post by MTP »

Base only allows SELECT statements in the Query window. To execute an UPDATE statement you have to go to the Tools->SQL window.

I don't think SELECT DISTINCT can be used on *, you have to type out the one field name to use DISTINCT on (and I think only one DISTINCT field per SELECT statement).
OpenOffice 4.1.1 on Windows 10, HSQLDB 1.8 split database
User avatar
DACM
Volunteer
Posts: 1138
Joined: Tue Nov 03, 2009 7:24 am

Re: Finding and updating records in two tables

Post by DACM »

Welcome pntos,

Thank you for mentioning your office version and database engine. You should consider adding the engine information ("Embedded HSQL database") to your 'User Control Panel Profile (tab) > Signature' as well.
pntos wrote:I want to compare the two tables "EMail" fields, and if equal, update the "CustomerMaster" "EMail" field with a 'Rejected' text string.
Well first things first...
pntos wrote:I also tried using the SELECT DISTINCT "CustomerMaster".* statement but it gives me a 'OutOfMemoryError for java heap space' error. My CustomerMaster table has 150,000 records while the Unsent table has 10,000 records of email that I need to remove.
As you indicated, you're using the bundled HSQL relational database, embedded within the ODB container file (zip-archive), as generated by default with Base. The HSQLDB engine is written in Java, so it runs in a virtual environment, which is limited to 64MB by default. Since HSQLDB compiles queries in RAM for maximum performance, the size of the query result-set is limited by available RAM within the Java Virtual Environment (JRE). If we assume a typical record size of 1KB, we should expect to run out of 64MB of RAM with as little as 50,000 rows/results at any stage in the query-processing (sub-queries, etc.). Larger results will require more RAM...unless utilizing 'disk-based' query results as available with the upgraded HSQLDB 2.x engine (among others). We can increase the Java Heap memory for all Base sessions using a setting in OpenOffice/LibreOffice as outlined below. In your case, with 150,000 rows available, you should plan to accommodate similar query tables (even intermediate result sets) during query processing. That means you'll need to allocate approximately 192MB or more of Java Heap memory. Today's computers come with a minimum of 4GB of RAM, so most folks have a GB or two free on average, depending on multitasking. Keep in mind that Base was originally developed in an era of computing history when computers averaged less than 1GB of total RAM. But that's no excuse for what the developers did next...

The default 'embedded database' format engineered by the original Base developers will choke on large records, tables, and queries such as this. Without going into the details, they constricted the memory cache available to HSQLDB to a tiny fraction of the recommended HSQL defaults (merely 6MB compared to the default 48MB, as I recall). We can increase this cache memory setting as well, but that still wouldn't address the biggest problem of all...

The 'embedded database' file format (ODB) as employed by Base has proven unstable and prone to file-corruption. I think it's just a matter of poor implementation by the developers.

So with all these factors working against you, your 'embedded' database data doesn't stand a chance of survival while running sizable queries on 150,000 rows. The fix is to perform a little file-surgery, while perhaps overriding the artificially-reduced Base-defaults for your extracted HSQL database. But in your case, I'm not even sure that HSQLDB 2.x would solve your issue without resorting to Stored Procedures and/or perhaps a Trigger -- involving some procedural SQL coding.

As a minimum, you'll need to perform the following (to properly accommodate your 150K+ record count):
  • 1. Close Base and make a backup of your ODB file
    2. Reopen the original in Base
    3. Select Tools > Options > Java/Advanced > Parameters (button) > Java start parameter: -Xmx256m > Assign
    4. Shutdown Base/OpenOffice (including Quickstarter in the Windows Task Tray as applicable)
    5. Reopen the original ODB in Base
    6. Select Tools > SQL... to open the SQL console in Base
    7. Type the following SQL command...followed by pressing the Execute button
pntos wrote:...how do I just compare the two tables and just display the records in the "CustomerMaster" table that are 'not equal' to the "EMail" in the "Unsent" table?
You can certainly compare the tables using a simple SELECT query.

But I'm not currently convinced that you can perform an UPDATE based on multi-table join criteria with HSQLDB! I'm a bit surprised. Actually, I can accomplish the UPDATE despite some reported errors, but the HSQL 1.8 engine seems to ignore the INNER JOIN and/or WHERE clause criteria...such that every record in the table erroneously receives the UPDATE. I'll do some more testing perhaps with HSQLDB 2.3.2 and post my results...but I wanted to wait and see if anyone else in the community was able to solve this dilemma by any means (syntax, views, stored procedure, etc.).
AOO 4.1.x; LO 4.2.x; Windows 7/8 64-bit
Warning: Avoid embedded databases --> Solution: Adopt a portable 'split database' folder
Soli Deo gloria
pntos
Posts: 4
Joined: Thu May 22, 2014 7:28 am

Re: Finding and updating records in two tables

Post by pntos »

MTP wrote:Base only allows SELECT statements in the Query window. To execute an UPDATE statement you have to go to the Tools->SQL window.

I don't think SELECT DISTINCT can be used on *, you have to type out the one field name to use DISTINCT on (and I think only one DISTINCT field per SELECT statement).
Yes you are right. I found out the hard way.
Apache OpenOffice 4.1 Non Embedded HSQL database
pntos
Posts: 4
Joined: Thu May 22, 2014 7:28 am

Re: Finding and updating records in two tables

Post by pntos »

Thank you Sir, for the detailed explanation on the memory issues. It does give me a better understanding of what lies ahead if I continue along that path for a solution.

I have since 'split' the database from embedded, and also increased the heap size of Java to 1GB, but have not tried the "Checkpoint Defrag" command.

If let's say the files or number of records are too big for processing, I can split out the table into smaller category tables. And instead of updating the 'EMail' records, what SQL statement should I write to filter out those records that are 'Not equal' in the 'EMail' fields of both tables? Realised HSQL do not have a 'Not Equal' condition.

Thanks again. Updated my signature too.
DACM wrote:Welcome pntos,

Thank you for mentioning your office version and database engine. You should consider adding the engine information ("Embedded HSQL database") to your 'User Control Panel Profile (tab) > Signature' as well.
pntos wrote:I want to compare the two tables "EMail" fields, and if equal, update the "CustomerMaster" "EMail" field with a 'Rejected' text string.
Well first things first...
pntos wrote:I also tried using the SELECT DISTINCT "CustomerMaster".* statement but it gives me a 'OutOfMemoryError for java heap space' error. My CustomerMaster table has 150,000 records while the Unsent table has 10,000 records of email that I need to remove.
As you indicated, you're using the bundled HSQL relational database, embedded within the ODB container file (zip-archive), as generated by default with Base. The HSQLDB engine is written in Java, so it runs in a virtual environment, which is limited to 64MB by default. Since HSQLDB compiles queries in RAM for maximum performance, the size of the query result-set is limited by available RAM within the Java Virtual Environment (JRE). If we assume a typical record size of 1KB, we should expect to run out of 64MB of RAM with as little as 50,000 rows/results at any stage in the query-processing (sub-queries, etc.). Larger results will require more RAM...unless utilizing 'disk-based' query results as available with the upgraded HSQLDB 2.x engine (among others). We can increase the Java Heap memory for all Base sessions using a setting in OpenOffice/LibreOffice as outlined below. In your case, with 150,000 rows available, you should plan to accommodate similar query tables (even intermediate result sets) during query processing. That means you'll need to allocate approximately 192MB or more of Java Heap memory. Today's computers come with a minimum of 4GB of RAM, so most folks have a GB or two free on average, depending on multitasking. Keep in mind that Base was originally developed in an era of computing history when computers averaged less than 1GB of total RAM. But that's no excuse for what the developers did next...

The default 'embedded database' format engineered by the original Base developers will choke on large records, tables, and queries such as this. Without going into the details, they constricted the memory cache available to HSQLDB to a tiny fraction of the recommended HSQL defaults (merely 6MB compared to the default 48MB, as I recall). We can increase this cache memory setting as well, but that still wouldn't address the biggest problem of all...

The 'embedded database' file format (ODB) as employed by Base has proven unstable and prone to file-corruption. I think it's just a matter of poor implementation by the developers.

So with all these factors working against you, your 'embedded' database data doesn't stand a chance of survival while running sizable queries on 150,000 rows. The fix is to perform a little file-surgery, while perhaps overriding the artificially-reduced Base-defaults for your extracted HSQL database. But in your case, I'm not even sure that HSQLDB 2.x would solve your issue without resorting to Stored Procedures and/or perhaps a Trigger -- involving some procedural SQL coding.

As a minimum, you'll need to perform the following (to properly accommodate your 150K+ record count):
  • 1. Close Base and make a backup of your ODB file
    2. Reopen the original in Base
    3. Select Tools > Options > Java/Advanced > Parameters (button) > Java start parameter: -Xmx256m > Assign
    4. Shutdown Base/OpenOffice (including Quickstarter in the Windows Task Tray as applicable)
    5. Reopen the original ODB in Base
    6. Select Tools > SQL... to open the SQL console in Base
    7. Type the following SQL command...followed by pressing the Execute button
pntos wrote:...how do I just compare the two tables and just display the records in the "CustomerMaster" table that are 'not equal' to the "EMail" in the "Unsent" table?
You can certainly compare the tables using a simple SELECT query.

But I'm not currently convinced that you can perform an UPDATE based on multi-table join criteria with HSQLDB! I'm a bit surprised. Actually, I can accomplish the UPDATE despite some reported errors, but the HSQL 1.8 engine seems to ignore the INNER JOIN and/or WHERE clause criteria...such that every record in the table erroneously receives the UPDATE. I'll do some more testing perhaps with HSQLDB 2.3.2 and post my results...but I wanted to wait and see if anyone else in the community was able to solve this dilemma by any means (syntax, views, stored procedure, etc.).
Apache OpenOffice 4.1 Non Embedded HSQL database
User avatar
DACM
Volunteer
Posts: 1138
Joined: Tue Nov 03, 2009 7:24 am

UPDATE based on two table criteria/joins

Post by DACM »

pntos wrote:I have since 'split' the database from embedded, and also increased the heap size of Java to 1GB, but have not tried the "Checkpoint Defrag" command.
No problem. That's just a little house-keeping command designed to reduce the size of the database to reflect only the current data.
pntos wrote:If let's say the files or number of records are too big for processing, I can split out the table into smaller category tables.
Don't worry about the number of disk-based records (table rows) with a relatively modern database engine like HSQLDB 1.8 (circa 2005), until you start pushing billions of rows per table. Of course, billion record databases require careful consideration of table structures, data-types, database engine limits, maximum file size limits imposed by the operating system, disk space, etc. I don't recall the limit for HSQLDB 1.8, but a quick upgrade to the latest iteration of HSQLDB 2.3.x (circa 2014) will support up to 270 billion records per database. So you're well within the limits with merely 150,000 records in a split database configuration.

To be clear, I was describing the maximum result-set size accumulated during any phase of query processing in RAM (based on the Java Heap allocation). The query optimizer first seeks to minimize the number of records processed based on JOINS and CLAUSES for maximum performance. But the real query optimizer is often us humans, as we minimize the number of fields in play, prefer indexed fields, avoid large data-types, and such. Left/Right/Natural INNER Joins shouldn't pose a problem in your 1GB RAM Java virtual environment up through 1 million results per query (maximum intermediate or final query-table records). OUTER Joins are always a little scary with large source tables because the query results may be several times the size of the source tables.
pntos wrote:...instead of updating the 'EMail' records, what SQL statement should I write to filter out those records that are 'Not equal' in the 'EMail' fields of both tables? Realized HSQL do not have a 'Not Equal' condition.
Short of a definitive answer on UPDATE based on multi-table criteria/joins with HSQLDB...I offer the following SELECT:
  • Have you tried something like (worked for me in testing with HSQLDB 1.8):

    Code: Select all

    SELECT DISTINCT "CompanyMaster".* 
    FROM "CompanyMaster" 
    WHERE "CompanyMaster"."EMail"
    NOT IN (SELECT "Unsent"."EMail" FROM "Unsent")
    This would be a relatively slow query due to the comparisons required and lack of matching indexed fields. But then I don't know your actual table structures as necessary to suggest optimizations or additional indexing.
pntos wrote:Updated my signature too. [Apache OpenOffice 4.1 Embedded HSQL database]
I know I'm throwing terminology at you with a fire hose, but you're HSQLDB files are no longer 'embedded' within the ODB file. So you're now running a 'split HSQL 1.8 database' as opposed to an 'embedded HSQL database.' Sorry for any confusion.
AOO 4.1.x; LO 4.2.x; Windows 7/8 64-bit
Warning: Avoid embedded databases --> Solution: Adopt a portable 'split database' folder
Soli Deo gloria
pntos
Posts: 4
Joined: Thu May 22, 2014 7:28 am

Re: Finding and updating records in two tables

Post by pntos »

Hi Sir,

I tried this:

Short of a definitive answer on UPDATE based on multi-table criteria/joins with HSQLDB...I offer the following SELECT:

Have you tried something like (worked for me in testing with HSQLDB 1.8):
CODE: SELECT ALL EXPAND VIEW
SELECT DISTINCT "CompanyMaster".*
FROM "CompanyMaster"
WHERE "CompanyMaster"."EMail"
NOT IN (SELECT "Unsent"."EMail" FROM "Unsent")

This would be a relatively slow query due to the comparisons required and lack of matching indexed fields. But then I don't know your actual table structures as necessary to suggest optimizations or additional indexing.


But it yields an empty table.

Also tried this :

Code: Select all

SELECT DISTINCT "CompanyMaster".* FROM "Unsent" AS "Unsent", "CompanyMaster" AS "CompanyMaster" WHERE NOT "Unsent"."EMail" IN ( SELECT "CompanyMaster"."EMail" FROM "CompanyMaster" )
Probably went into a loop and heap memory problems again.
Apache OpenOffice 4.1 Non Embedded HSQL database
User avatar
MTP
Volunteer
Posts: 1620
Joined: Mon Sep 10, 2012 7:31 pm
Location: Midwest USA

Re: Finding and updating records in two tables

Post by MTP »

DACM wrote: I'm not currently convinced that you can perform an UPDATE based on multi-table join criteria with HSQLDB!
DACM - maybe this thread has some answers? Using UPDATE with multiple tables
OpenOffice 4.1.1 on Windows 10, HSQLDB 1.8 split database
Post Reply