MTP wrote:I added a new [record] to the broken database. Only later did I copy the files. I'm afraid this may have created new (bad) files.
That may be true, but it's not your fault. After a power loss, it's best to make a copy of the split-database folder before re-starting the RDBMS (database server app). But that's an extraordinary measure.
MTP wrote:...checkpoint. I was wondering if never shutting down the server was what made the system susceptible to data loss of this kind.
Running database server application (RDBMS) 24/7 shouldn't be a problem unless
other software anomalies crop-up over time -- such as incomplete garbage collection by the Operating System, framework, or virtualization software. However, in the absence of regular (daily) CHECKPOINT, the layers of protection are reduced. Based on the normal file scheme employed by all the named-brand database systems (Oracle, MS SQL Server, DB2, PostgreSQL, MySQL, etc.), an open
file-handle to the .log file (or equivalent) does increase the chances of acute file-corruption (session data loss). This dynamic .log file contains the session data since the last CHECKPOINT or SHUTDOWN. The design can handle an instantaneous power-cut quite well, but anything becomes possible with a brown-out or electrical surge, to include loss of session data since the last CHECKPOINT. That's where a laptop-based server or
UPS (battery backup) comes in handy; you can even trigger a
script based on
battery level to issue SHUTDOWN utilizing
sqltool.jar (as included with
HSQLDB and with the Base
templates and associated
tutorials).
Routine tasks such as running
CHECKPOINT (daily) or the occasional
SHUTDOWN (weekly or monthly) of the database and engine are prudent. These commands can be issued manually (Tools>SQL), or automated by scheduled
script or
macro.
Either of these commands will increase the layers of protection by incorporating the session changes (the open .log file) into the relatively static .data and .backup files. The static nature of these files reduces the possibility of file corruption from power fluctuations, as the durability of the hard-drive subsystem becomes the limiting factor.
Another reason to issue the CHECKPOINT command is that third-party backup software may ignore open (locked) files by default. So the .log file may not be backed-up in all cases. And because the .log file is dynamic in a running database, a third-party backup may not be useful for data-recovery unless preceded by a CHECKPOINT. Since modern backup software monitors file-changes in real-time, simply issuing a CHECKPOINT at any time on a running database will trigger an automatic backup of the database (complete with off-site redundancy in the case of
cloud backup services).
Now, HSQLDB 2.x is designed to accommodate
much larger databases perhaps running in high-concurrency environments (many active users). In my experience on a relatively slow 2009 laptop (see chart below), CHECKPOINT can process about 35MB per second. So most Base users would experience a sub-second delay for CHECKPOINT processing (virtually instantaneous). On the other hand, a 1GB database would require about 30 seconds of "down-time" as the database transactions are paused during a CHECKPOINT. Now imagine the impact with a multi-Terabyte database ( 8 TB limit for normal data with HSQLDB 2.3 ). So under extreme conditions, full CHECKPOINT processing is no longer the best option. That's why
HSQLDB 2.3 offers a "hot-backup" feature. This feature is designed to backup large databases in high-concurrency environments, where performance would otherwise become unacceptable during lengthy CHECKPOINT command processing. Incremental backup (effectively incremental CHECKPOINT) features have also been added to accommodate these extreme environments.
With proper use of the LOB data-type (Large Object Binary), I wouldn't imagine that we have very many Base users with .data files in the Gigabyte range, muchless Terabytes. So with a modest database of under 100 MB (.data file size), there's little reason to pursue anything beyond CHECKPOINT on a running database, while employing daily or real-time backup software with history/version features such as we get for free with
cloud sync'd folders. In summary,
I would rely on CHECKPOINT on running databases in less demanding environments as it provides the best trade-off between down-time (mere seconds) and completeness.
- Sliderule and I investigated the performance of SHUTDOWN / CHECKPOINT with HSQLDB a few years ago with the following results:
DACM wrote:
Here's the results as reported by SQL Workbench/J:
Code: Select all
SIZE (# of records) SHUTDOWN SHUTDOWN COMPACT
0.83 MB (5,000 rows ) 0.07 seconds 0.34 seconds
1.67 MB (10,000 rows ) 0.10 seconds 0.56 seconds
4.13 MB (25,000 rows ) 0.15 seconds 1.25 seconds
8.25 MB (50,000 rows ) 0.27 seconds 2.34 seconds
16.5 MB (100,000 rows) 0.48 seconds 4.66 seconds
33.0 MB (200,000 rows) 0.95 seconds 11.0 seconds
66.0 MB (400,000 rows) 1.79 seconds 20.7 seconds
132 MB (800,000 rows) 3.66 seconds 42.0 seconds
These tests were run on a laptop with a
mid-range mobile CPU and a mechanical hard drive. For comparison, the CPU is one-third of the speed of an
i7 mobile CPU, and a fraction of the speed of the
fastest consumer CPU today. The hard drive subsystem is also a fraction of the speed of a modern Solid State or Hybrid Drive.
MTP wrote:So if I make the logsize smaller, it will automatically make checkpoints more often?
You can reduce it to 1MB but that may not trigger a CHECKPOINT very often...? I can't recommend reliance on this file-size trigger, in general.
The long-term reliability of any database is predicated on the diligence of the database administration. This admin can be
automated through scheduled scripts utilizing SQLTool to dispatch SQL commands, or by a Form macro (autorun or push button). In light of Murphy's Law, multiple of layers of protection are prudent...
Prudent layers of protection for critical data (perhaps in order of importance):
- Layer 1: A stable computing environment with sufficient RAM
Layer 2: Battery backup power (implies built-in surge protection and a degree of line-conditioning...simply use a laptop or a UPS)
Layer 3: A transactional RDBMS with ACID properties and sufficient backup and recovery automation (HSQLDB, H2, PostgreSQL, MySQL, Firebird, etc.)
Layer 4: Regular CHECKPOINT (or VACUUM) of the running database
- 4a. Issue this command at least daily, perhaps using a scheduled script or Form macro automation (autorun or push button)
4b. Specialized "hot-backups" and/or "incremental backup" measures may be necessary with very large databases (GB+) in high-concurrency environments where CHECKPOINT performance proves unacceptable
4c. HSQLDB 2.x does not maintain a .backup file by default, so if you don't see this file in your database folder, then issue the following command using the Tools > SQL console in Base to initiate "full" backup upon CHECKPOINT or SHUTDOWN. You will notice "... INCREMENT FALSE" (below) which enables "full" backup as opposed to "incremental" backup. With very large databases (Gigabyte .data file size) "incremental" backups become necessary to minimize CHECKPOINT processing delays in demanding environments. Either backup mode is sufficient.
Layer 5: Employ additional backup layers with file-history (versioning) features
- 5a. These features are provided in real-time (automated instant backup) by most cloud sync'd folders such as Dropbox, Google Drive, or similar. This type of off-site folder synchronization is super easy but privacy issues emerge; client-side encryption such as provided by Boxcryptor, Wuala and SpiderOak becomes highly desirable. Each time you issue a CHECKPOINT or SHUTDOWN command, the changed files are backed-up immediately to your local machine and to the cloud, with access to previous versions of each file.
5b. Selected backup software can provide similar file history to both internal or external drives, to include NAS or even FTP sites.
5c. HSQLDB 2.3.x offers "incremental" and "online" (hot backup) facilities. "Incremental backup" is designed to enhance CHECKPOINT or SHUTDOWN performance on very large databases (Gigabyte .data file size), as mentioned previously. We can switch between "incremental" and "full" backup with HSQLDB at anytime. "Online" hot-backup adds the ability to initiate a full backup of a running database as a single compressed file (default option) such as for archiving purposes. When combined with "incremental" backup, "online" hot-backup is an efficient way to backup a large database in a demanding (high-concurrency, multi-user) environment. The associated backup command can be run from a batch file using the provided SQLTool on a schedule (leveraging a script or the operating system). This built-in backup option can be a good choice, particularly when 5a/b above (cloud-folders or third-party backup software) are not an option. But keep in mind that options 5a. and 5b. above maintain a backup of the entire split-database folder including front-end components (Base .odb) plus any subfolders used for external image or document storage. And 5a. (or even 5b.) provides an off-site copy utilizing personal encryption as necessary. So when 5a./5b. can be employed, they provide a more thorough solution, invoked simply by running CHECKPOINT or SHUTDOWN -- which can be automated by batch/script or a push-button macro on a Form.
Layer 6: Periodic SHUTDOWN [COMPACT] of the RDBMS (perhaps weekly) and server hardware (perhaps monthly)
Layer 7: Employ RDBMS-based clustering
[/color]
MTP wrote:I have contacted our contractor who does daily backups of our servers...Hopefully that will lead to recovery.
Let's hope