Edit: Nota bene: This post is a translation of this personal Web page in my so poor English. If somebody want to write to me a private message for communicate to me a best translation (or expression), it is important to know that I have realy nothing against it... |
In the suveys such as those that can be done using spreadsheets, it is common to use merged cells for the column headers.
This avoids repetitions and it also makes it possible to present things more simply, especially for novice or inexperienced users who, because this type of presentations is very visual, can easily set up these structures by means of the mouse.
On the other hand, in order to establish a strictly tabular database as much recoverable by a programming language as by a DBMS (R), this clearly complicates the naming of data columns. One can then envisage translating the names of the strict columns by using a separator such as the point ("."), insofar as it is understood that it is not used within the cells themselves. So, for the example that precedes, the second and fourth columns, respectively, may be called "Foot.Diameter" and "Body.Diameter.Inside".
In any case, we are dealing with a tree structure:
If you have a lot of spreadsheet with structures of this type, it can be very long, or even particularly tedious, to "read" in the perspective of a data port. We can so dream of a macro that would allow the analysis of this tree structure, leaving to the user only the work of its define by means of the merged cells, followed by a simple selection of the set to define the limits, charging the algorithm to make more profits from this analysis...
Because once the tree is decrypted, it can be declared in many things that are very funny and much more significant than merged cells... As already mentioned above, this may be limited to a list of cell names separated by A separator but also be transformed into a TreeView control, or even last but not least, in XML code... So the preceding example can already be translated as follows:
Code: Select all
<?xml version="1.0"?>
<SELECTION>
<CELL>Notes</CELL>
<CELL>
Foot
<CELL>Diameter</CELL>
<CELL>Length</CELL>
</CELL>
<CELL>
Body
<CELL>
Diameter
<CELL>Inside</CELL>
<CELL>Outside</CELL>
</CELL>
<CELL>Length</CELL>
</CELL>
<CELL>
Mouth
<CELL>Width</CELL>
<CELL>Height</CELL>
</CELL>
</SELECTION>
The sequential reading of the cells is very similar to that of the file system directories [French, sorry], but maybe a little more complicated to define, since where an instruction like "getFolderContents" directly returns a full URLs array of each file of a given directory, it is here necessary to define, not only the left and right edges of the cell to analyze the contents located just below it, but also to enumerate the childrenw one by one... This also implies the imaginary projection of a merged cell, just above the selection and measuring exactly its width. This is the mother of all the others. In all cases, since the depth of the branches is not fixed, recursive programming is used, the "Sub" procedure naturally calling itself.
When setting up the merged cells, a rule must be scrupulously observed: all cell layers must reach the last line of the selection, it being understood that no "child" cell is located beneath it. For the example presented above, it can thus be seen that the cells "Diameter" and "Length" which lie under the cell "Foot" are spread over two lines.
But the algorithm of the macro works even if it is the cell "Foot" which spreads on two lines, leaving then only one line to the cells "Diameter" and "Length". In short, this does not change anything as long as you respect the full height filling of the selection.
Similarly, it is essential that the contents of all merged cells lie in the topmost and left-most content, since this is where the text content is retrieved. Finally, there is no problem with the algorithm of reading a cell merged in width over several columns of the spreadsheet while it does not have a child cell. This is the case for the "Notes" cell (see attached file).
It should be noted that the tree is stored directly in an XML structure, using the DOM available in the Basic of the software suite. Thus, the two types of display in the dialog box ("TreeView" and "Separate fields") are only interpretations of the XML document created ex-nihilo. It is this same DOM that can be found in JavaScript or PHP (to name only mainstream languages) with an extremely similar syntax. To learn the functioning and to integrate its use can therefore give us glimpses of fruits on other trees than our office suite, especially if we project an inter-connectivity of data whose sequel would be the first part of data entry...
To analyze and understand the code or for subsequent retouching, the XML document is saved by the appropriate button in the dialog box. In this regard, I did not bother to visualize the XML file, in a "textField" since web browsers do this very well. So once the file is written, it is requested a visualization within a browser. This is adjustable by the "browser" constant present at the beginning of the macro.
It should also be noted that special attention has been paid to keeping the "columnNumber" attribute in the <CELL> tags, even if this is not useful for treeView control or display in separate fields. It is not (only) to make a demonstration of the possibilities of the DOM... It is well because the purpose of this little joke is, later, to make a selection on the columns of numbers contained in the Spreadsheet to introduce them in the same XML structure. Knowing that the column of the spreadsheet is noted here, it will then become possible to refer to it for the data’s incorporation.
So if the purpose of this macro is not very explicit for the neophytes in these matters, it seems to me, on the other hand, obvious to anyone who knows the advantage and the power of the XML files. It makes it possible to envisage the processing of complex files, particularly to retrieve data that have been entered rather anarchically into spreadsheet, with the obvious objective of standardizing them in databases or, as I do for my profession (organbuilder), in the establishment of a description language, based on XML.
Finally, all this does not belong exactly to the spontaneous generation ... The French forum having provided me the main sources listed here (of course in French, I am sorry):
- "Obtenir les cellules d’une cellule fusionnée" (Getting cells from a merged cell).
- "Contrôle d’arborescence (TreeView) avec listener" (TreeView control with listener).
- "Import de données XML dans un ODT" (Importing XML data into an ODT ).