Page 1 of 1

[Solved] Pull data from local XML file with FILTERXML

Posted: Sat Jun 23, 2018 8:55 pm
by Wolfhart
Hello,

I know that the function FILTERXML can be used to pull data from an online xml source, like so:

Code: Select all

=FILTERXML(WEBSERVICE("http://www.example.com/data.xml"),"/xpath/to/data")
Now, I want to pull data from a local xml file, but cannot manage to figure out how to do that. My question, in short, is this: With what do I have to replace

Code: Select all

WEBSERVICE("http://www.example.com/data.xml")
in the above formula if the file data.xml from which I want to pull data is a local file that is stored in the same directory as the spreadsheet?

Thanks in advance for your help!

Re: pull data from local XML file with FILTERXML

Posted: Sat Jun 23, 2018 10:26 pm
by Villeroy
file:///path/data.xml

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 6:18 am
by Wolfhart
So you mean that the formula should be the following:

Code: Select all

FILTERXML(file:///path/data.xml,"/xpath/to/data")
That doesn't work, it produces "Err:507". I also tried

Code: Select all

FILTERXML(WEBSERVICE("file:///path/data.xml"),"/xpath/to/data")
but that doesn't work either, it produces "#VALUE!".

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 4:04 pm
by Villeroy
No idea. I tried with the libreoffice configuration and with the example from the help file.

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 6:00 pm
by MrProgrammer
Wolfhart wrote:Now, I want to pull data from a local xml file, but cannot manage to figure out how to do that. My question, in short, is this: With what do I have to replace WEBSERVICE("http://www.example.com/data.xml") in the [FILTERXML functioon] if the file data.xml from which I want to pull data is a local file that is stored in the same directory as the spreadsheet?
I run OpenOffice, not LibreOffice, so I cannot do any testing, but I can offer some ideas.

I believe the WEBSERVICE function only supports the HTTP: and HTTPS: protocols, and in particular does not support FILE: so one cannot use WEBSERVICE to retrieve local data unless you create a local webserver to access it. You can do that, of course, by installing and configuring Apache HTTP Server on your machine or in your network. However this could be more involved that you'd want. Apache HTTP Server has their own support pages.

FILTERXML needs the XML string for the first operand.
• You could open your local file with Writer, copy the data to the clipboard, paste that data into a cell, then specify the cell address as the first FILTERXML operand. Re-copy/paste if the data changes. You might run into limits concerning the amount of data in a single cell. In that case use multiple cells and concatenate them for FILTERXML.
• You could use formula =T('file://{YourFileName}'#$Sheet1.A1) to load the data into a cell in the sheet with the FILTERXML formula. This is good if the XML data is a single line, because all the XML will be loaded into a single cell. Specify the cell address for FILTERXML.
• For a single line XML file you could use Insert → Sheet from file → Separated by → {uncheck everything} → Column type → Text → OK → ✓ Link → OK. Specify the new sheet and cell address A1 for FILTERXML.
• If the XML file has multiple lines you could open it with Writer and condense it into a single line. You might run into limits concerning the number of characters in a single paragraph.
* If the XML file has multiple lines you can use the previous two ideas, but for the first you'll need to fill the formula down as needed to load all the lines. In either case you'll need to concatenate the cells for FILTERXML. This will be tricky if you don't know in advance the line count of the XML file.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the title. You can select the green checkmark icon at the same time.

[Tutorial] Ten concepts that every Calc user should know

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 7:05 pm
by Wolfhart
Thank you, Villeroy and MrProgrammer, for your replies! But I can't get any of the proposed solutions to work.

@Villeroy: The solution proposed in the attachment, which is to indicate the path to the xml file in a cell and then refer to that cell in the FILTERXML formula, does not work in my case. It produces "#VALUE!".

@MrProgrammer: The data in the xml file will change regularly, so the first and third solutions you propose are not an option for me. And the second solution you propose, with the T function, produces the following behavior: When I put the formula

Code: Select all

=T('file://{MyFileName}'#$Sheet1.A1)
into a cell, LibreOffice replaces "file://" with "smb://" (without asking me! why?) and the formula then produces "#REF!".

I am puzzled by the fact that this is so difficult. If it is easy to pull xml data from an online source, why is it not easy to pull it from a local file?

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 7:11 pm
by RoryOF
Is the local file on the local network?

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 7:18 pm
by Villeroy
LibreOffice should handle smb: however, I would test this functionality with a local file, just to eliminate one possible source of error. And with no special chars in the path, no spaces, quotes, braces etc.

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 7:30 pm
by Wolfhart
I don't understand exactly what smb is and what it has to do with my problem. LibreOffice replaces "file" with "smb" when I try to use the T function (as MrProgrammer suggested), and I don't understand why it does that. The xml file from which I want to pull data is in the same folder as the spreadsheet.

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 7:38 pm
by RoryOF
SMB is often an indicator that there is a network file access involved. However, LibreOffice may generalise all accesses to appear as network file accesses - I cannot say as I don't use Calc at that level, and LibreOffice not at all.

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 8:20 pm
by Villeroy
smb: (aka Samba) is the protocol which makes it possible to read files from other machines that offer the smb service, typically Windows machines but you can also share file between Linux computers via smb

Avoiding all the file access trouble, as suggested by MrProgrammer and with the most simplified XML in A1

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<Foo>1</Foo>
=FILTERXML(A1;"Foo") gives #NA which is another error value at least. I would expect value 1.

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 8:52 pm
by Wolfhart
Villeroy wrote:=FILTERXML(A1;"Foo")
It has to be

Code: Select all

=FILTERXML(A1;"/Foo")
(i.e., with a slash before "Foo"). Like that it works. But it doesn't solve my problem. I have noticed already that I can put the xml data in a cell and then refer to that cell in FILTERXML. But how can I automatically load the data in my xml file into the cell? Again, the xml file will be updated regularly, and so I want to load it automatically (i.e., I don't want to have to copy the data manually into the cell every time).

Re: Pull data from local XML file with FILTERXML

Posted: Sun Jun 24, 2018 11:55 pm
by Lupp
I did a bit of testing with two examples provided by w3cschools. One was http://www.w3schools.com/xml/simple.xml, the otherone I copied from their tutorial, and pasted it into a plain text file which I stored to my own domain: http://psilosoph.de/testFilterXML/books.xml.

In both cases I was able to get the content into a Calc sheet (one cell per .xml file) with a call to WEBSERVICE(url), but only in the first case FILTERXML() could work with it as expected. Since I suspected the line (paragraph) breaks in the second example to prevent success, I removed that with calls to SUBSTITIUTE() and succeeded.

This induced the idea to experimentally abandon the usage of WEBSERVICE() and to replace it with a little user function reading the source file and removing Asc(10) and Asc(13) from it. This may have degraded efficiency, but it increased reliability. FILTERXML() accepted the output of my 'emulateXpathService()' as its input and returned the expected results. The relevant fact in this context is that the function accepted web-URL and local filepaths as well during my tests. A solution for local files using WEBSERVICE() would most likely require to run a local web server providing the services WEBSERVICE() relies on.

For your own testing I provide the code:

Code: Select all

Function emulateXpathService(Optional pFilePath As String) As String
emulateXpathService = ":fail:"
If IsMissing(pFilePath) Then pFilePath = ""
pFilePath = ConvertToURL(pFilePath)
If FileExists(pFilePath) Then
  Dim args(0) As New com.sun.star.beans.PropertyValue
  args(0).Name  = "Hidden"
  args(0).Value = True
  docSource     = StarDesktop.LoadComponentFromURL(ConvertToURL(pFilePath),"_blank",0,args())
  On Local Error Goto noTextProperty
    xmlString     = docSource.Text.String
    docSource.Close(True)
    xmlString     = Join(Split(xmlString,Chr(13)),"")
    xmlString     = Join(Split(xmlString,Chr(10)),"")
    
    emulateXpathService = xmlString
  noTextProperty:
End If
End Function
I am interested in the improvements you will suggest.

Re: Pull data from local XML file with FILTERXML

Posted: Mon Jun 25, 2018 6:23 pm
by Wolfhart
Thank you very much for your reply, Lupp! But I have a very basic question: Where do I need to save the code you provide in order to be able to call the function in my spreadsheet? (I have never used a user-defined function in Calc before.)

Re: Pull data from local XML file with FILTERXML

Posted: Tue Jun 26, 2018 2:49 am
by Lupp
There are a few issues first.
-0- None of the formulae under discussion here will work with any AOO up to the current V4.1.5. In AOO version 4.1.5 StarDesktop.LoadComponentFromURL() also refused to open an XML file with Writer when I tested.
-1- Both the functions WEBSERVICE() and FILTERXML() were first time implemented with LibreOffice V4.2. Your signature shows V4.1.3.
-2- FILTERXML() was buggy when used under array-evaluation. The bug was fixed only for V5.2.2 and higher. (See: https://bugs.documentfoundation.org/sho ... i?id=88257.)
-3- As far as I can judge the function is still buggy concerning empty lines in the layout of the source (XML-file). Such lines should simply be ignored but cause errors.
-4- I couldn't test under any Linux.
-5- My demo was created under LibO V6.0.4.2 and worked as expected/described under Win 10.

Using the custom function I proposed WEBSERVICE() is dispensable for files. (I don't know enough about services on the web to go beyond files.) FILTERXML(), however, does substantial work parsing the source.

The code itself can be placed in a Basic module of the document needing it. This is the case with the attached demo. To get the macro to work then you need to permit macro execution when prompted while the file is loaded. You will only be prompted for the permission if the 'Macro Security' is set to 'Medium'. (Never choose 'Low'!) The code (in fact the module containing it) can also be moved with the help of the 'Organizer' to the 'Standard' library of the local LibO to be available from any document.

Re: Pull data from local XML file with FILTERXML

Posted: Tue Jun 26, 2018 6:46 pm
by Wolfhart
Thanks a lot for your help, Lupp! Your function works on my computer in the file you provide. I use LibreOffice 5.1.6.2 on Ubuntu. (My signature was out of date, sorry!) However, I did not yet manage to make it work in my case. I suppose that this is because the path to the xml file contains a space. So I have a couple of questions:

1) Is there a way to make the function work with file paths that contain spaces? I tried putting the file path in quotes, but that didn't help (I always get the result ":fail:").
2) Alternatively, is there a way to specify the path in a relative way (i.e., relative to the location of the spreadsheet)?
3) I am still puzzled by the fact that it needs a custom function to do the job. All I want to do, after all, is load the data from a simple text file (xml files are simple text files after all) into the spreadsheet. Is there no built-in function to do that?

Re: Pull data from local XML file with FILTERXML

Posted: Tue Jun 26, 2018 10:30 pm
by Lupp
The path is processed by 'ConvertToURL' anyway. This method should accept spaces in the path string. I tried it on my system and it worked as expected. Are you Sure you are using the correct path delimiter of your system?

I attach a version slightly reworked to ease debugging. Open it and replace the strings in A7, A23 with tzhe path-names you actally use.

Open the Basic module for editing and debugging and execute the function 'emulateXpathServiceDebug' step by step till you get an error. Quit the error and trigger a recalculation of your sheet. Now you can execute the user code step by step again and observe what happens. ...
(Of course you will need to experiment and to learn a bit if you to date never used the IDE with its watch window ...)

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 12:25 am
by Villeroy
Spaces in paths make simple programs complicated. Either you remove the spaces in your paths or you replace any spaces (hex code 20) in URLs with %20.
https://en.wikipedia.org/wiki/Percent-encoding

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 12:32 pm
by Lupp
In the case under discussion the bit of code I provided takes a pathname for a file as the OS shows it and passes it to the 'ConvertToURL()' method which returns the correct file-URL. The replacement of spaces by "%20" is part of the conversion.

(Of course, I also never create folders or files with spaces or other non-name characters in their names. However, Win has such a space even in "Program Files" where I cannot change it. Not to blame Win unjustly: If I remember correctly, the nonsense originated in the UNIX world decades ago.)

BTW @Villeroy: Do you know what is necessary to tell 'StarDesktop.LoadComponentFromURL()' that it should open a plain text file with any extension (in this case "xml") with Writer in AOO. I tried the argument

Code: Select all

  args(0).Name  = "FilterName"
  args(0).Value = "swriter"
(also slightly modified in different ways, "swriter:Text" e.g.) but did not succeed. AOO returns the intended document object as 'Null'. LibO opens the file as intended also without additional precautions, but accepts the argument described above.

=== Editing 2018-06-27 ===
In LibO the above shown argument definition (as well as similar ones) is simply ignored.
In AOO 4.1.5 neither the above shown attempt nor settings as descibed in Andrew Pitonyaks "Macro Document" nor the "generic_Text" as the value for a FilterName argument suggested by Villeroy in his post below worked for me. The simple FilterName-value "Text" did.
Of course this only helps to load a file with extension ".xml" (e.g.) with Writer, and not to parse it for data based on the Xpath syntax as the FILTERXML() function of LibO does. I won't try to write user code based on the XXPathAPI interface to implement a similar function for AOO. (It should be possible, however.)

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 1:36 pm
by Villeroy
FilterName "generic_Text" seems to do the trick:

Code: Select all

Sub Main
url = "file:///tmp/DELAS/37-259-0-364-KBR-72-1-6.xml"

Dim args(0) as new com.sun.star.beans.PropertyValue
args(0).Name  = "FilterName"
args(0).Value = "generic_Text"
doc = StarDesktop.loadComponentFromURL(url, "_blank", 0, args())
End Sub
The attached zip contains 2 Python macros dumping all Information about installed file filters and the detection factory into a new spreadsheet.
The output is a textual representation of Python variables, e.g. array of various types:

Code: Select all

((com.sun.star.beans.PropertyValue){ Name = (string)"x-default", Handle = (long)0x0, Value = (any){ (string)"PGM - Portable Graymap" }, State = (com.sun.star.beans.PropertyState)DIRECT_VALUE },)
With LibreOffice there are some rows with values in extra columns. I never really adjusted the code to LibreOffice except for the syntax of Python3 which is slightly different than AOOs Python2.
Nevertheless, this simple tool is helpful. You find the filter names in column E ("Type") of the FilterFactory output. The code works with both AOO and LO.

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 8:54 pm
by Wolfhart
You were right, Lupp, the problem was not the space in the file path, but rather a stupid typo of mine. Sorry for the false alarm!
Now, with the typo corrected, your function works beautifully. Thank you very much!

I take it from your discussion with Villeroy that the answer to the third of my above questions, about there being a built-in function to load a plain text file, is negative. What about the second question, about indicating the file path in a relative way? Is that possible? It would make the spreadsheet more portable, that's why I ask.

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 9:17 pm
by Villeroy
How to open a file with the plain text filter regardless of content and name suffix:
menu:File>Open...
File type: Text(*.txt) in the first block of text file types in order to open the text with Writer.
File type: Text(*.txt, *.csv) in the second block of file types in order to open the text with Calc.
Then navigate to the directory and enter the file name into the name box regardless of the visible file types. Even though all *.xml and *.html files are hidden you can still select a file by entering the name.

Using the FilterName "generic_Text" in a file-open macro does the same.

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 9:30 pm
by Wolfhart
@Villeroy: I am sorry, apparently I didn't express myself clearly. With my third question, I meant the following: Is there a built-in function that I can use in a cell of a spreadsheet that will automatically load the contents of a text file (stored locally) into that cell? I am assuming here that the contents of the text file will change regularly and that I do not want to load the data manually every time. I take it from the preceding discussion that the answer is negative (since, otherwise, Lupp's custom-built function would probably not be necessary).

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 11:10 pm
by Lupp
Wolfhart wrote:@Villeroy: ...Is there a built-in function that I can use in a cell of a spreadsheet that will automatically load the contents of a text file (stored locally) into that cell? ...
I'm not Villeroy. Nonetheless I will answer: No.
The function WEBSERVICE() is the only candidate but cannot work with a file.

My question in return:
Did you (Wolfhart) succeed meanwhile applying the user function I provided?

Anyway, there is another issue:
To use a single cell for the import of a complete xml-file requires to accept a limit of 65535 (2^16 - 1) characters contained in that file. 1
[This is also the limit if the WEBSERVICE() is used.]

Re: Pull data from local XML file with FILTERXML

Posted: Wed Jun 27, 2018 11:27 pm
by Wolfhart
Lupp wrote:My question in return:
Did you (Wolfhart) succeed meanwhile applying the user function I provided?
Yes, I did! I said so in the post of a couple of hours ago. Again, thank you very much for your help!

Re: [SOLVED] Pull data from local XML file with FILTERXML

Posted: Thu Jun 28, 2018 12:26 am
by Lupp
(Had missed the post you pointed me to.)

Concerning the second question:
Generally paths used in references an links are shown as full absolute paths. If you enable the respective option under Tools > Options > Load/Save > Save the paths are nonetheless saved as relative to the location of the document's file.
Since the file to load is represented by the text content of a cell in your case this does not apply here.
If the relative folder path of a resource file regarding the location of the document is as simple as ./ (e.g.) you can get the completed absolute path of that resource with a few lines of user code again based on the URL property of ThisComponent (which must already exist, of course).
Since one instance of LibreOffice may run documents loaded from different paths at the same time, and even documents that were never saved so far, and therefore have no location in the file system, there isn't something like a "current directory" concerning the documents. The current directory returned by the Basic Function CurDir() is the directory from which soffice.exe was loaded.

Re: [SOLVED] Pull data from local XML file with FILTERXML

Posted: Thu Jun 28, 2018 9:58 am
by Lupp
One additional remark concerning the code I provided:
I did it this way because I could write that code from scratch without any "research". (Then came some side-questions only concerning AOO and the 'FilterName' issue.)

In fact opening a .xml file with Writer to get its content is most likely not the best way, much less the only one.
An appropriate way should be to use the uno-service 'SimpleFileAccess'. A elaborate example how to do so, and also how to use the service to Parse xml you find under 5.38 in Andrew Pitonyal's "Useful Macro Information" obtainable from http://www.pitonyak.org/oo.php. (The links to www.oooforum.org there are dead, however.)

Re: [Solved] Pull data from local XML file with FILTERXML

Posted: Thu Jun 28, 2018 2:59 pm
by Wolfhart
I understand. Thanks again, Lupp!

Re: [Solved] Pull data from local XML file with FILTERXML

Posted: Fri Dec 13, 2019 9:44 pm
by jscatharina
when I use the function "=filterxml(webservice("http://viacep.com.br/ws"/95010080/xml/""))" I get the message #nome?