Contingency tables (cross tables) from m-n-relations

Forum rules
No question in this section please
For any question related to a topic, create a new thread in the relevant section.
Post Reply
User avatar
Villeroy
Volunteer
Posts: 31264
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Contingency tables (cross tables) from m-n-relations

Post by Villeroy »

This tool helps to deal with fairly difficult stuff. Not for newbees.

Statistical applications and Calc charts can not handle normalised row sets very well. They expect one separate column for each variable.
We want to convert this...

Code: Select all

H1	H2	Val
A	X	1
B	X	2
A	Y	3
B	Y	4
C	Y	5
... into this...

Code: Select all

_	X	Y
A	1	3
B	2	4
C 		5
... when we want to plot charts or feed statistical application.
I used this tool sucessfully with 18000 values and 20 changing column fields.

The attached database tool contains a most simple Python macro, some example data in 3 related tables constituting a many-to-many relation and a form which helps to generate the required SQL to select cross tables from the relation.

Such a query requires that certain parts of the SQL string are repetetive.
You select the row headers (above A, B, C) and join them with one sub-query for each filtered category set. Then you select one column of filtered values from each of the sub-queries (above X, Y).

The general pattern for 3 column fields goes like this:

Code: Select all

SELECT "row_fields"
, "col_field {X}"."Value" AS "Column Header {text X}"
, "col_field {Y}"."Value" AS "Column Header {text Y}"
, "col_field {Z}"."Value" AS "Column Header {text Z}"
FROM "row_field_table"
LEFT JOIN (SELECT "Value","row_id" FROM "value_table" WHERE "Value" = {X} ) AS "col_{X} ON "row_field_table"."ID" = "col_{X}"."foreign_key"
LEFT JOIN (SELECT "Value","row_id" FROM "value_table" WHERE "Value" = {Y} ) AS "col_{Y} ON "row_field_table"."ID" = "col_{Y}"."foreign_key"
LEFT JOIN (SELECT "Value","row_id" FROM "value_table" WHERE "Value" = {Z} ) AS "col_{Z} ON "row_field_table"."ID" = "col_{Z}"."foreign_key"
the 3 field selectors and the 3 joins are identical except for the variables marked as {X}, {Y}, {Z}
I moved the field separating commas to the beginning of the following line so the 3 repetetive SELECT lines are identical.

Once you managed to create a satisfactory test query with 2 or 3 column fields on your own data, you create a list box query which selects a descriptive text and the primary key from the second detail table which constitutes your column fields. Something simple like SELECT "Name", "ID" FROM "My Column Item Table". Name that query "qListbox". "qListbox" is the dummy row set of my input form and it is the source of the list box on my form. When you select and unselect the column items from the list box (yes, it is a multi-select box), an event macro dumps SQL into a text box that is derived from a skeleton in another list box.

The skeleton based on the above pattern is this:

Code: Select all

SELECT "row_fields"
, "col_field {1}"."Value" AS "{0}"
FROM "row_field_table"
LEFT JOIN (SELECT "Value","row_id" FROM "value_table" WHERE "Value" = {1} ) AS "col_{1} ON "row_field_table"."ID" = "col_{1}"."foreign_key"
The placeholder {1} will be substituted by the hidden key value of the list box. The column alias "{0}" (in quotes) will be substituted by the visible list box text.
Copy the resulting SELECT statement, run it and debug by changing the SQL skeleton. The statement can be used in views if you built it in the native SQL dialect of your database. MySQL for instance requires a different syntax which is no problem since the tool simply generates text.

Installation
-- Save this demo to a trusted directory (Tools>Options>Security>Macro Security) if you want to play with my example
 Edit: You can forget all of the following installation steps if you can copy your detail table that provides the column header strings into my example document and modify "qListbox" accordingly so it pulls the header labels and the primary key from that table. Then use the generator form in my document with your own query skeleton and the proper column information. 
-- Create the above mentioned list box query "qListbox"
-- Create a working test query and a skeleton for the query with placeholders {1} and {0] in separate lines.
-- If you don't know how to embed Python code in a Base document, extract the embedded Python from the document archive to %user_profile%/Scripts/python/ so the macro appears globally under "My Macros".
-- Copy my form into your document and open it for editing.
-- Adjust the list box event so it points to the new global macro instead of searching for the embedded macro. If your list box source is not named "qListbox", change the query name AND change the logical form's source. There has to be some arbitrary form source in order to fill a list box with database data. I have chosen the same "qListbox" as dummy source of the logical form. Just do not link any of the form controls to your database fields.

This is the simple Python code which selects the single lines that do not contain placeholders from "txtSource", combines them with multiple instances of the lines with {0} and {1} and dumps them into "txtSQL":

Code: Select all

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

def composeSQL(ev):
    oSource = ev.Source
    oModel = oSource.getModel()
    oForm = oModel.Parent
    src = oForm.getByName("txtSource").getString()
    txt = oForm.getByName('txtSQL')
    aLines = src.splitlines()
    aIndices = oSource.SelectedItemsPos #selected list indices
    aLabels = oSource.SelectedItems #selected strings
    aValues = oModel.SelectedValues #selected bound values
    sql = ''
    for l in aLines:
        if ('{0}' in l) or ('{1}' in l):
            n = 0
            s = l
            while n < len(aLabels):
                if (n > 0) and l.lstrip().startswith('WHERE'):
                    s = l.replace('WHERE', 'AND', 1)
                lbl = aLabels[n]
                val = aValues[n]
                sql += s.format(lbl, val)+'\n'
                n += 1
        else:
            sql += l +'\n'

    txt.setString(sql)
The alternative script organizer for Python helped me to embedded the Python module into the database document.
Without such tool you simply embed it as Scripts/python/pyDBA/XTableSQL.py in the zip archive that constitutes the document and add the following to META-INF/manifest.xml

Code: Select all

<manifest:file-entry manifest:media-type="" manifest:full-path="Scripts/python/XTableSQL.py"/>
<manifest:file-entry manifest:media-type="application/binary" manifest:full-path="Scripts/python/"/>
<manifest:file-entry manifest:media-type="application/binary" manifest:full-path="Scripts/"/>
Attachments
XTableQueries.odb
Python based SQL generator on a form with some example data.
(16.51 KiB) Downloaded 560 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
MrProgrammer
Moderator
Posts: 4883
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Contingency tables (cross tables) from m-n-relations

Post by MrProgrammer »

Villeroy wrote:We want to convert this {NormalizedTable} into this {ContingencyTale}
I perform this transformation using Data → Pivot Table, formerly Data → Data Pilot. First I add a column (D in this example) which contains the formula =ROW().
R\C A     B     C       D
1   H1    H2    Val     Row
2   A     X     Data1     2
3   B     X     Data2     3
4   A     Y     Data3     4
5   B     Y     Data4     5
6   C     Y     Data5     6
Then in the pivot table dialog I would drag H1 to Row Fields, H2 to Column Fields, Row to Data Fields, More, Uncheck all six options at bottom, OK. In some situations it may be useful to use Results to → - new sheet -, but I did not do that here.
Pivot table dialog
Pivot table dialog
This produces:
After creating pivot table
After creating pivot table
The final step is to select the data cells in the pivot table, B10:C12 in this example, Edit → Find & Replace → More options, select Current selection only and Regular Expressions, Search for .+, Replace with =C&, Replace All. I recommend unchecking Current selection only and Regular expressions, before clicking Close. Column C is specified in the Replace with box because it contains the data values in the source data, Data1, Data2, …. B10 contains the formula =C2, C10 contains =C4, etc.
After changing .+ → =C&amp;
After changing .+ → =C&
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
Villeroy
Volunteer
Posts: 31264
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Contingency tables (cross tables) from m-n-relations

Post by Villeroy »

Pivoting the row number and turning it into a reference is so clever :!:
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Post Reply