Converting XML Feed to CSV

Discuss the spreadsheet application

Converting XML Feed to CSV

Postby nguyenmm » Mon Dec 03, 2018 2:20 pm

Hi friends,

I'm struggling here a bit.

I have an 8mb xml feed that I'd like to convert to CSV.

The feed looks like this:

<channel>
<title>W3Schools Home Page</title>
<link>https://www.w3schools.com</link>
<description>Free web building tutorials</description>
<item>
<title>RSS Tutorial</title>
<link>https://www.w3schools.com/xml/xml_rss.asp</link>
<description>New RSS tutorial on W3Schools</description>
</item>
<item>
<title>XML Tutorial</title>
<link>https://www.w3schools.com/xml</link>
<description>New XML tutorial on W3Schools</description>
</item>
</channel>

What are the steps to import the xml into calc and then get into columns eg:

Title, Link, Description etc?

I have tried to apply the same techniques with videos shown for Excel but can't find anything options relating to Calc.
Please help. Cheers
OpenOffice 4.1.5 on Windows 10
nguyenmm
 
Posts: 1
Joined: Mon Dec 03, 2018 2:16 pm

Re: Converting XML Feed to CSV

Postby Villeroy » Mon Dec 03, 2018 4:00 pm

It is not possible unless you are a programmer. A programmer would not even need Calc nor Excel.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, OpenOffice 4.x & LibreOffice 6.x
User avatar
Villeroy
Volunteer
 
Posts: 26401
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Converting XML Feed to CSV

Postby Lupp » Mon Dec 03, 2018 4:24 pm

I'm irritated again.
Basically XML describes trees. It also has means to embed tables into these trees. The given example is a tree that cannot be represented by a table, imo, except tables in table cells are allowed. Text processors (and html editors) can do it. Spreadsheets (and csv) can't. In what way am I wrong?
Code: Select all   Expand viewCollapse view
<channel>
  <title>W3Schools Home Page
  </title>
  <link>https://www.w3schools.com
  </link>
  <description>Free web building tutorials
  </description>
  <item>
    <title>RSS Tutorial
    </title>
    <link>https://www.w3schools.com/xml/xml_rss.asp
    </link>
    <description>New RSS tutorial on W3Schools
    </description>
  </item>
  <item>
    <title>XML Tutorial
    </title>
    <link>https://www.w3schools.com/xml
    </link>
    <description>New XML tutorial on W3Schools
    </description>
  </item>
</channel>
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2344
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Converting XML Feed to CSV

Postby Villeroy » Mon Dec 03, 2018 4:29 pm

Right. There is one channel with one title, one link, one description but multiple items. This requires at least 2 normalized tables for items and channels.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, OpenOffice 4.x & LibreOffice 6.x
User avatar
Villeroy
Volunteer
 
Posts: 26401
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Converting XML Feed to CSV

Postby John_Ha » Mon Dec 03, 2018 4:42 pm

This is the sort of thing for which regular expressions are excellent. You may need to split the file into several smaller parts to make processing faster.

A regular expressions search with <[^>]+> will strip all tags leaving your example as

Code: Select all   Expand viewCollapse view
W3Schools Home Page
https://www.w3schools.com
Free web building tutorials

RSS Tutorial
https://www.w3schools.com/xml/xml_rss.asp
New RSS tutorial on W3Schools


XML Tutorial
https://www.w3schools.com/xml
New XML tutorial on W3Schools

It should now be pretty easy to structure it so it can be put into csv form with another regular expression search which recognises the web addresses (Google for many examples) and deletes the previous and the following line feeds, and replaces them by, say, %.

 Edit: Not required. Use the Alternate Find and Replace add-on to search for empty paragraphs and replace them by £. Now replace end of paragraphs by %. With a little thought you will quickly find searches which get the three items onto a single line separated by % used as the separator. 

You now have lines like

Code: Select all   Expand viewCollapse view
W3Schools Home Page%https://www.w3schools.com%Free web building tutorials%
%
RSS Tutorial%https://www.w3schools.com/xml/xml_rss.asp%New RSS tutorial on W3Schools%
%
%
XML Tutorial%https://www.w3schools.com/xml%New XML tutorial on W3Schools%

Name the file fred.csv and open it with Calc. Sort on any column to gather all the blank rows together and remove them. Save as a .ods.

See attached file.

Clipboard01.gif

See [Tutorial] How to record a macro (and Regular Expressions) for more information.

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
Attachments
three lines.ods
(9.48 KiB) Downloaded 6 times
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the Writer guide.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6468
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Converting XML Feed to CSV

Postby Lupp » Mon Dec 03, 2018 6:33 pm

Applaudung to this ingenious solution I would nonetheless insist on keeping the different levels apart.
The XML file/string/stream presentd by the OP is not convertible to a table.
We can, however define one level (I would suggest the second one) to be the mother of our table.

For getting XML streams and for extracting content based on the Xpath syntax Calc in LibreOffice V4.2 and higher provides the functions WEBSERVICE() and FILTERXML() for compatibility with MS Excel.

Against what I would find reasonable XPATH actually allows to extract something like "columns by name" independent of the node level.
See https://www.w3schools.com/xml/xml_xpath.asp, https://en.wikipedia.org/wiki/XPath, or better https://de.wikipedia.org/wiki/XPath.

Concerning the approach by John_Ha I would also point another time to the coming LibreOffice V6.2.0 and its new REGEX() function. Presently you can test with V6.2.0.0Beta1 which is still a Dev version and installs in parallel.

The attached example using FILTERXML() will only run under LibO V4.2 or higher, as already mentioned.
Attachments
aoo96107_FilerXml_Xpath_demo1.ods
(19.92 KiB) Downloaded 4 times
On Windows 10: LibreOffice 6.2 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 2344
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Converting XML Feed to CSV

Postby John_Ha » Mon Dec 03, 2018 7:23 pm

Another simpler method is to put all the lines into a spreadsheet as below and use the spreadsheet sort.

Code: Select all   Expand viewCollapse view
W3Schools Home Page
https://www.w3schools.com
Free web building tutorials

RSS Tutorial
https://www.w3schools.com/xml/xml_rss.asp
New RSS tutorial on W3Schools


XML Tutorial
https://www.w3schools.com/xml
New XML tutorial on W3Schools

Now delete all the blank rows. Adding a working column B which numbers all rows. Add another C which records if the line is blank or not. Sort on C then B. This places all blank rows together. Delete the blank rows. The remaining rows are still in the original order.

Clipboard01.gif

Now just move the 2nd cell A2 to B1 and the third cell A3 to C1. Repeat for all rows.

As always the quality of the result will depend on the quality of the data. This method assumes we always have three lines separated by one or more blank lines

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the Writer guide.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6468
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Converting XML Feed to CSV

Postby MrProgrammer » Tue Dec 04, 2018 7:39 pm

nguyenmm wrote:I have an 8mb xml feed that I'd like to convert to CSV.
Calc was not designed for XML. You are using the wrong tool so the task will be difficult, like driving screws with a hammer.

Villeroy wrote:A programmer would not even need Calc nor Excel.
A programmer would use XSLT, which was designed for XML manipulation. A non-programmer would use a tool like QXmlEdit.

Villeroy wrote:This requires at least 2 normalized tables for items and channels.
But we could use an unnormalized table. In this case the OP would want to filter the table by Key=Idescription.
201812041033.ods
(34.66 KiB) Downloaded 6 times

Also see [Solved] XML file to Calc columns.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the title. You can select the green checkmark icon at the same time.

[Tutorial] Ten concepts that every Calc user should know
Mr. Programmer
AOO 4.1.5 Build 9789 on MacOS 10.11.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Volunteer
 
Posts: 3617
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Converting XML Feed to CSV

Postby John_Ha » Tue Dec 04, 2018 7:42 pm

MrProgrammer wrote:the task will be difficult, like driving screws with a hammer.

As long as the hammer is heavy enough that works quite well for me :roll:
As for spreading jam - I agree a knife performs better than a hammer.
AOO 4.1.6, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the Writer guide.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 6468
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK


Return to Calc

Who is online

Users browsing this forum: No registered users and 18 guests