LO Base - Working with tables of differing structure

Discuss the database features
Post Reply
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

LO Base - Working with tables of differing structure

Post by kbellis »

Hello,
I'm a beginner with databases and LibreOffice (never strayed much away from OpenOffice Calc until recently), and having stalled out in an earlier attempt to try and learn LO Base, am making another go at it. The help I received earlier from Villeroy and others convinced me that this type of problem was clearly not suited to a spreadsheet. A recent Base-related thread ([Solved] Calculated fields in Database) was somewhat helpful, but wasn't exactly addressing the issues being studied, namely:

LibreOffice Base - Working with tables of differing structures
Problem Example: D:\Documents\PWS\Barometer Calibration\Data\test1.odb
Barometric data sources originate from different places containing structures that vary enough to make things not quite so simple to work with in LO Base.

Table 1 = "E4351 metric 20190107-10" (D:\Documents\PWS\Barometer Calibration\Data\E4351 metric 20190107-10.csv)
Attribute: Date_Time referenced to UTC time zone
Record interval: about 30 minutes

Table 2 = "F3015 metric 20190106-09" (D:\Documents\PWS\Barometer Calibration\Data\F3015 metric 20190106-09.csv)
Attribute: Date_Time referenced to EST time zone
Record interval: about 5 minutes

Both databases are included in the attached Data.zip

Before querying pressure differences between Table 1 and Table 2, I would like to perform an operation that will 1) eliminate time offsets; i.e., get both tables on the same time zone (with no preference to UTC over local time zone); and 2) average barometric pressure attributes over the common interval of one hour for both tables and thereby, for example, allowing for a cardinality of (1..1) in this initial exercise. Later, once I've gotten things straightened out on what I'm supposed to be doing, this will be applied to more tables than two, e.g., a dozen or so, each containing thousands of records.

How can this be accomplished in LibreOffice Base? (LO 6.2.2.2)

Thank you very much for any replies.

Kind regards,

Kelly

--
V. Kelly Bellis, PLS
17 Union Street
Ellsworth, ME 04605
Attachments
Data.zip
(8.99 KiB) Downloaded 211 times
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: LO Base - Working with tables of differing structure

Post by Villeroy »

First of all, we need to know the type of database your Base document is connected with. "Embedded HSQL" is deprecated. Among several other problems, it does not support time zones; all times are local system times. HSQL2 does support time zones but I did not use that feature yet.
Let's ignore the zone problem for now.

########################################################################
First of all, you need a text table linked to the text file you are going to import.

Code: Select all

CREATE TEXT TABLE "Import1"(
"Station_ID"	CHAR(5) NOT NULL,
"Date_Time"	CHAR(20) NOT NULL,
"altimeter_set_1"DECIMAL(8,2) NOT NULL,
"air_temp_set_1"DECIMAL(4,2) NOT NULL,
"relative_humidity_set_1"DECIMAL(3,2)NOT NULL,
"pressure_set_1d"DECIMAL(8,2) NOT NULL,
"sea_level_pressure_set_1d" DECIMAL(8,2) NOT NULL
);
SET TABLE "Import1" SOURCE "Import.csv;ignore_first=true";
menu:View>Refresh Tables
This creates a table "Import1" and links it to a file "Import.csv" in the same directory as the database document with "embedded HSQL". import.csv has one leading row of column labels to be ignored and default settings otherwise (comma delimited, point decimals).
The Date_Time field is imported as a character sequence of 20 characters for now.

########################################################################
In order to save your existing database tables, lets create a similar table for testing:

Code: Select all

CREATE TABLE "Import_Test"(
"Station_ID"	CHAR(5) NOT NULL,
"Date_Time" TIMESTAMP NOT NULL,
"TZ"	CHAR(3) NOT NULL,
"altimeter_set_1"DECIMAL(8,2) NOT NULL,
"air_temp_set_1"DECIMAL(4,2) NOT NULL,
"relative_humidity_set_1" DECIMAL(3,2) NOT NULL,
"pressure_set_1d"DECIMAL(8,2) NOT NULL,
"sea_level_pressure_set_1d" DECIMAL(8,2) NOT NULL
);
menu:View>Refresh Tables
It has a time stamp column plus one more column for the zone info. We can't simply copy the text table into the data table.

########################################################################
Now create a view in SQL view. A view is similar to a query. However, the view is saved in the visible scope of the database. Keep in mind that "the database" (HSQL) is something completely different than Base.
Go to the tables section
menu:Insert>View (simple)
Paste the following query definition:

Code: Select all

SELECT 
"Station_ID",
CAST(SUBSTR("Date_Time",7,4)||'-'||SUBSTR("Date_Time",1,2)||'-'||SUBSTR("Date_Time",4,2)||SUBSTR("Date_Time",11,6)||':00.000' AS TIMESTAMP) AS "Date_Time",
SUBSTR("Date_Time",18,3) AS "TZ",
"altimeter_set_1",
"air_temp_set_1",
"relative_humidity_set_1",
"pressure_set_1d",
"sea_level_pressure_set_1d"
FROM "Import1"
Save it under some name, say "Import1_View"
This converts the 20 characters of the date-time string into a valid ISO timestamp and puts the last 3 characters into the additional TZ field.
01/09/2019 22:59 UTC is regrouped to 2019-01-09 22:59:00.000 and the CAST function can convert that string into a time stamp.

Now it should be possible to copy the view "Import1_View" and paste it onto the data table "Import_Test".
 Edit: fixed CHAR(4) for the first field which was one char too short 
Last edited by Villeroy on Sat Apr 13, 2019 4:09 pm, edited 1 time in total.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: LO Base - Working with tables of differing structure

Post by Villeroy »

OK, the above fails because the column labels of the csv files are invalid. They contain unquoted line breaks which causes 2 rows of column labels.

Code: Select all

Station_ID,Date_Time,altimeter_set_1,air_temp_set_1,relative_humidity_set_1,pressure_set_1d,sea_level_pressure_set_1d
,,Pascals,Celsius,%,Pascals,Pascals
Export this stuff with less verbouse column labels or put them in quotes. They will be ignored anyway.
If you export the stuff from some database in your local network, it could be possible to link this database directly which would safe a lot of work and unnecessary complexity.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

Re: LO Base - Working with tables of differing structure

Post by kbellis »

Villeroy wrote:First of all, we need to know the type of database your Base document is connected with. "Embedded HSQL" is deprecated. Among several other problems, it does not support time zones; all times are local system times. HSQL2 does support time zones but I did not use that feature yet.
Let's ignore the zone problem for now.
Genesis RE: D:\Documents\PWS\Barometer Calibration\Data\test1.odb
The genesis was predicated upon having one folder where I've kept *.csv, *.txt, *.ods, and a few .png files. This was the same location as where the test1.odb was saved. Then I chose to connect to an existing database using the Text connection type.
Connection using Text.PNG
This was followed by all of the *.csv files showing up as tables automatically in test1.odb, a great feature I think!
differing table structures.PNG
So before moving off of the time zone problem, does the Text type of database connection support time zones?

This is only a question for further understanding. It isn't super critical to fix in this instance as the time zone is optionally determined when downloading the data from mesowest.utah.edu, so it will be better if I simply download all stations that will be analyzed in only one time zone. Given the built-in problems of mucking with daylight savings, probably GMT will be best.
Time zone can be determined at the time of data download.PNG
Time zone can be determined at the time of data download.PNG (5.73 KiB) Viewed 3853 times
Villeroy wrote:########################################################################
First of all, you need a text table linked to the text file you are going to import.
Before I go too much further in studying your response, which is very, very much appreciated, and working through it a few times, am I correct in thinking that what I had outlined above under the heading of 'Genesis', was wrong to begin with, and that I should have instead started out by creating a new db choosing to embed the db in the .odb?
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

Re: LO Base - Working with tables of differing structure

Post by kbellis »

Sorry - can only include three images at a time per post, and had to wait before adding another; "You cannot make another post so soon after your last.".

Meant to include this one last:
embedded options when creating new db.PNG
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: LO Base - Working with tables of differing structure

Post by Villeroy »

OK, not knowing the database connection is like not knowing which database software we are talking about. You have 2 Base documents, one is connected to a directory of text files and the other one has an embedded HSQLDB; and now you want to collect (copy) text data in embedded HSQL, right?

######## in order to finish my first attempt with a working example #######################
The attached pair of files contains one of your csv files with valid column headers but untouched otherwise and a Base document with embedded HSQL. The database has a text table linked to the csv ("Import1"), a view that selects all the fields from the text file splitting the time in time stamp and zone ("Import1_View") and a true database table which is prepared to take new data from the text file through the view ("Import1_Test").
When you open the text table "Import1" you can see all fields in the csv files. The times are imported as 20 characters.
When you open the view, you can see all csv data with proper time stamps.
The regular table "Import1_Test" is empty.
You can copy the icon of the view, select the "Import_Test" table which has the exact same structure, paste and confirm the dialog with [Finish...] because all fields are of the right types in right order. Notice that you can not do this twice because I put a unique index on the Station_ID and the time stamp. This way you won't be able to import duplicates.

This is how I use import some of our bank account data that come as csv with nasty German dates, comma decimals and 14 fields of remarks instead of one. A little script closes the database connection to the HSQL database, copies the current csv file from the download directory to the right place and then I copy a view icon over a table icon. The actual data conversion and field order is done in plain SQL by the view definition. This is much better than anything involving Calc.
##########################################################################

To be honest, I have no clear idea how to copy from the text table source. Correct time stamp values are vital for your observations. The capabilities of the text driver are even more limited than the capabilities of embedded HSQL.

Nevertheless, this part should work well:

Code: Select all

SELECT <several other columns>,
SUBSTR("Date_Time",7,4)||'-'||SUBSTR("Date_Time",1,2)||'-'||SUBSTR("Date_Time",4,2)||SUBSTR("Date_Time",11,6)||':00.000' AS "ISO_Stamp",
<more columns>
FROM "some_text_table"
This generates a text that represents a time stamp value in ISO notation YYYY-MM-DD HH:SS:00.000. When you copy around these strings, they are accepted as valid input to time stamp fields. However, is it possible to copy a text database's query into a HSQL table?
Attachments
topic97704.odb
(3.81 KiB) Downloaded 204 times
Import.csv
(9.42 KiB) Downloaded 212 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: LO Base - Working with tables of differing structure

Post by Villeroy »

Villeroy wrote:Nevertheless, this part should work well:

Code: Select all Expand view
SELECT <several other columns>,
SUBSTR("Date_Time",7,4)||'-'||SUBSTR("Date_Time",1,2)||'-'||SUBSTR("Date_Time",4,2)||SUBSTR("Date_Time",11,6)||':00.000' AS "ISO_Stamp",
<more columns>
FROM "some_text_table"
Oh, no. It does not work at all. There are some nasty deficiencies that have not been fixed until today. Function name SUBSTR has to be written as SUBSTRING. And the concatenation operator || that works with HSQL does not work with the text driver. The text driver accepts function CONCAT(a,b,c,d,...). The query that converts a US time stamp into an ISO stamp looks like this:

Code: Select all

SELECT "Station_ID", 
CONCAT( SUBSTRING ( "Date_Time", 7, 4 ), '-', SUBSTRING ( "Date_Time", 1, 2 ), '-', SUBSTRING ( "Date_Time", 4, 2 ), SUBSTRING ( "Date_Time", 11, 6 ), ':00' ) "Date_Time", 
SUBSTRING ( "Date_Time", 18, 3 ) "TZ", 
"altimeter_set_1", "air_temp_set_1", "relative_humidity_set_1", "pressure_set_1d", "sea_level_pressure_set_1d" 
FROM "Import"
I can copy this into a spreadsheet and get all numeric values right, including the date-times.
we concatenate the 4 year digits at position #7 with '-' and with the 2 month digits at position #1 and with '-' and with the 2 day digits at position #4 and the hours:minutes with leading space at position #11 and append a trailing seconds part ':00'.

And finally I found a not too clumsy way how to copy this query into the table of another database:
Copy the query icon.
Right-click>Paste-Special on the other database's table icon.
Choose HTML as transfer format.
The name of the table name should appear on top of the import dialog, option "Append data" should be preselected and you should check "first line as column names". Then click the [Create...] button. All the data are copied into their respective columns without type validations.

This is an old list of availlable functions for the simple database drivers (spreadsheet, text, dBase, address books): http://www.openoffice.org/dba/specifica ... tions.html from my bookmarks. I don't know if and where this is documented for LibreOffice.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

Re: LO Base - Working with tables of differing structure

Post by kbellis »

Villeroy wrote:OK, not knowing the database connection is like not knowing which database software we are talking about. You have 2 Base documents, one is connected to a directory of text files and the other one has an embedded HSQLDB; and now you want to collect (copy) text data in embedded HSQL, right?
No. Sorry - my bad for any confusion; I only brought up the second instance of HSQLDB for the screen grab as it seems to be the only way to compel embedding, and the screen grab was just for illustrative purposes in my efforts to make sure I was following you correctly. So just to be clear, your first read was correct: I only have one Base document (test1.odb), and it is connected to a directory of text files (D:\Documents\PWS\Barometer Calibration\Data\).

I understand that the type of database connection enters into decision making when working with tables of differing structure. My question remains: Should I have instead of connecting to a directory of text files, started out by creating a new db choosing in order to embed the db in the .odb? And if that answer be yes, should it have been the HSQLDB instead or Firebird? Besides being a wonderful musical suite by a 27-year old Stravinsky, what is LO's Firebird?
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

Re: LO Base - Working with tables of differing structure

Post by kbellis »

Villeroy wrote:OK, the above fails because the column labels of the csv files are invalid. They contain unquoted line breaks which causes 2 rows of column labels.

Code: Select all

Station_ID,Date_Time,altimeter_set_1,air_temp_set_1,relative_humidity_set_1,pressure_set_1d,sea_level_pressure_set_1d
,,Pascals,Celsius,%,Pascals,Pascals
Export this stuff with less verbouse column labels or put them in quotes. They will be ignored anyway.
If you export the stuff from some database in your local network, it could be possible to link this database directly which would safe a lot of work and unnecessary complexity.
Just to flesh this out a bit further
These data come from personal weather stations which have submitted their data to be ingested and critically examined by MADIS and have met certain quality thresholds. The data can be downloaded through MesoWest, as mentioned above; however, I don't have much control in its structure; e.g., separate field for Time_Zone, or how units might be presented in column headers. But maybe that could be changed; though would be unexpected after writing today to atmos-mesowest@lists.utah.edu - who knows?
RE: MesoWest and Synoptic data downloads

Downloaded data contains double rows for column headers; e.g.,
second line with units.PNG
second line with units.PNG (5.85 KiB) Viewed 3815 times
Wherein the second row contains units of measure; e.g., Pascals, Celsius, %

Unfortunately, this creates an issue that must be dealt with in SQL queries, and in order to get around that issue, the second row must be deleted. Consider making an alternate method available wherein there is only one row containing attributes (column headers), and attribute_strings_include_units.

Another wrangling issue that we're dealing with is the Date_Time values have appended to them time zone information. Consider making it optional to provide Time_Zone as an attribute, next to, but separate and apart from Date_Time.
separate column for TZ.PNG
separate column for TZ.PNG (3.67 KiB) Viewed 3815 times
Thank you for any reply and for any considerations.

Kind regards,

Kelly
--
V. Kelly Bellis, PLS
17 Union Street
Ellsworth, ME 04605
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
UnklDonald418
Volunteer
Posts: 1547
Joined: Wed Jun 24, 2015 12:56 am
Location: Colorado, USA

Re: LO Base - Working with tables of differing structure

Post by UnklDonald418 »

Besides being a wonderful musical suite by a 27-year old Stravinsky, what is LO's Firebird?
Firebird in this context is an open source database engine.
As distributed Base is a front end for a HSQL database engine. Base can be used with a number of database engines, Firebird being one.
As distributed Base is an Embedded configuration where everything, tables, queries, forms, reports and the database engine is bundled into a single zip archive. If you use 7zip you can open an .odb file directly and inspect all the files and directories stored in it, otherwise you might need to rename it with a zip extension. It is a handy format for learning to use databases, but it turns out that for many applications it isn't very stable. The Embedded database engine is HSQL version 1.8 which was released in 2007 but the big issue is that if something goes wrong with the zip process the archive can become corrupt and unreadable. That issue comes up quite often, so when using an Embedded database be sure to do frequent backups.
A few years back the developers of LO announced that they wanted to replace HSQL with Firebird. Apparently, it turned out to be more difficult than they anticipated.
The data loss problem can be circumvented by using a JDBC connection (also referred to as a Split database) where the tables and the database engine are stored outside of the .odb file. With the database engine stored externally a more recent version of the HSQL engine can easily be used, although there have been some compatibility issues reported beginning with version 2.4.0.
[Wizard] Create a new 'split' HSQL 2.x database
If your problem has been solved, please edit this topic's initial post and add "[Solved]" to the beginning of the subject line
Apache OpenOffice 4.1.14 & LibreOffice 7.6.2.1 (x86_64) - Windows 10 Professional- Windows 11
User avatar
kbellis
Posts: 58
Joined: Sat May 28, 2011 6:25 pm
Location: Ellsworth, ME

Re: LO Base - Working with tables of differing structure

Post by kbellis »

Thanks Don for the reply.
UnklDonald418 wrote: As distributed Base is a front end for a HSQL database engine. Base can be used with a number of database engines, Firebird being one.
As distributed Base is an Embedded configuration where everything, tables, queries, forms, reports and the database engine is bundled into a single zip archive.


Regarding embedding in general; 1) as you've pointed out, and from what I've read and heard of issues when dealing with large files and with LibreOffice taking an extended period of time to complete a task, and/ or crashing during certain operations including data entry; e.g. video 15. So it seems that going the embedded route will inevitably create size-related issues - is this correct? If this is true, is there a size threshold best not to exceed when embedding.

2) My earlier question, asked twice and maybe not clearly enough, but not answered, does the type of connection that gets used by Base impact the availability of functions and operations for making SQL queries? Like what I think I understood Villeroy to point out how date_time fields can be worked with in Base depends on which connection type gets used; e.g., Text. Embedding issues notwithstanding, what connection type affords the greatest array of querying tools? - followup - I'm still working through http://www.openoffice.org/dba/specifica ... tions.html; however, am not clear to what extent this may be applicable to LibreOffice.
UnklDonald418 wrote: The data loss problem can be circumvented by using a JDBC connection (also referred to as a Split database) where the tables and the database engine are stored outside of the .odb file. With the database engine stored externally a more recent version of the HSQL engine can easily be used, although there have been some compatibility issues reported beginning with version 2.4.0.
[Wizard] Create a new 'split' HSQL 2.x database
More for me to learn; thanks for the link. Can split databases be created with LibreOffice alone?
LO6222(x64) • AOO416 • Windows 10 Pro, 64-bit
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: LO Base - Working with tables of differing structure

Post by Villeroy »

I understood "tables of differing structure" as tables with different types (as imported from non-standard text) and with different order of columns. Differing table structures can be fixed by means of queries or views which arrange columns in the right order and convert non-standard text to appropriate data types.
Availlable functions are documented for your respective database engine.
I can offer a very simple Python macro which can do the following to your embedded HSQL:
-- Extract and reconnect the embedded HSQLDB so it works as a stand-alone HSQLDB with the same driver of version 1.8 as the embedded version. In many cases this will meet the expectations while offering a higher level of data safety.
-- Extract and reconnect the embedded HSQLDB so it works as a stand-alone HSQLDB with a later driver version.
-- Reconnect the Base document to an external database after the files had been moved (to another system).
-- Reconnect back to an embedded HSQLDB if things go wrong.
-- Delete an embedded HSQLDB if you don't need that anymore.

viewtopic.php?f=21&t=86071&p=401918
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Post Reply