[Solved] Converting XML Feed to CSV

Discuss the spreadsheet application
Post Reply
nguyenmm
Posts: 1
Joined: Mon Dec 03, 2018 2:16 pm

[Solved] Converting XML Feed to CSV

Post by nguyenmm »

Hi friends,

I'm struggling here a bit.

I have an 8mb xml feed that I'd like to convert to CSV.

The feed looks like this:

<channel>
<title>W3Schools Home Page</title>
<link>https://www.w3schools.com</link>
<description>Free web building tutorials</description>
<item>
<title>RSS Tutorial</title>
<link>https://www.w3schools.com/xml/xml_rss.asp</link>
<description>New RSS tutorial on W3Schools</description>
</item>
<item>
<title>XML Tutorial</title>
<link>https://www.w3schools.com/xml</link>
<description>New XML tutorial on W3Schools</description>
</item>
</channel>

What are the steps to import the xml into calc and then get into columns eg:

Title, Link, Description etc?

I have tried to apply the same techniques with videos shown for Excel but can't find anything options relating to Calc.
Please help. Cheers
Last edited by MrProgrammer on Sun Jan 03, 2021 10:03 pm, edited 1 time in total.
Reason: Tagged ✓ [Solved]
OpenOffice 4.1.5 on Windows 10
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Converting XML Feed to CSV

Post by Villeroy »

It is not possible unless you are a programmer. A programmer would not even need Calc nor Excel.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Converting XML Feed to CSV

Post by Lupp »

I'm irritated again.
Basically XML describes trees. It also has means to embed tables into these trees. The given example is a tree that cannot be represented by a table, imo, except tables in table cells are allowed. Text processors (and html editors) can do it. Spreadsheets (and csv) can't. In what way am I wrong?

Code: Select all

<channel>
  <title>W3Schools Home Page
  </title>
  <link>https://www.w3schools.com
  </link>
  <description>Free web building tutorials
  </description>
  <item>
    <title>RSS Tutorial
    </title>
    <link>https://www.w3schools.com/xml/xml_rss.asp
    </link>
    <description>New RSS tutorial on W3Schools
    </description>
  </item>
  <item>
    <title>XML Tutorial
    </title>
    <link>https://www.w3schools.com/xml
    </link>
    <description>New XML tutorial on W3Schools
    </description>
  </item>
</channel>
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Converting XML Feed to CSV

Post by Villeroy »

Right. There is one channel with one title, one link, one description but multiple items. This requires at least 2 normalized tables for items and channels.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Converting XML Feed to CSV

Post by John_Ha »

This is the sort of thing for which regular expressions are excellent. You may need to split the file into several smaller parts to make processing faster.

A regular expressions search with <[^>]+> will strip all tags leaving your example as

Code: Select all

W3Schools Home Page
https://www.w3schools.com
Free web building tutorials

RSS Tutorial
https://www.w3schools.com/xml/xml_rss.asp
New RSS tutorial on W3Schools


XML Tutorial
https://www.w3schools.com/xml
New XML tutorial on W3Schools
It should now be pretty easy to structure it so it can be put into csv form with another regular expression search which recognises the web addresses (Google for many examples) and deletes the previous and the following line feeds, and replaces them by, say, %.
 Edit: Not required. Use the Alternate Find and Replace add-on to search for empty paragraphs and replace them by £. Now replace end of paragraphs by %. With a little thought you will quickly find searches which get the three items onto a single line separated by % used as the separator. 
You now have lines like

Code: Select all

W3Schools Home Page%https://www.w3schools.com%Free web building tutorials%
%
RSS Tutorial%https://www.w3schools.com/xml/xml_rss.asp%New RSS tutorial on W3Schools%
%
%
XML Tutorial%https://www.w3schools.com/xml%New XML tutorial on W3Schools%
Name the file fred.csv and open it with Calc. Sort on any column to gather all the blank rows together and remove them. Save as a .ods.

See attached file.
Clipboard01.gif
See [Tutorial] How to record a macro (and Regular Expressions) for more information.

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
Attachments
three lines.ods
(9.48 KiB) Downloaded 89 times
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Converting XML Feed to CSV

Post by Lupp »

Applaudung to this ingenious solution I would nonetheless insist on keeping the different levels apart.
The XML file/string/stream presentd by the OP is not convertible to a table.
We can, however define one level (I would suggest the second one) to be the mother of our table.

For getting XML streams and for extracting content based on the Xpath syntax Calc in LibreOffice V4.2 and higher provides the functions WEBSERVICE() and FILTERXML() for compatibility with MS Excel.

Against what I would find reasonable XPATH actually allows to extract something like "columns by name" independent of the node level.
See https://www.w3schools.com/xml/xml_xpath.asp, https://en.wikipedia.org/wiki/XPath, or better https://de.wikipedia.org/wiki/XPath.

Concerning the approach by John_Ha I would also point another time to the coming LibreOffice V6.2.0 and its new REGEX() function. Presently you can test with V6.2.0.0Beta1 which is still a Dev version and installs in parallel.

The attached example using FILTERXML() will only run under LibO V4.2 or higher, as already mentioned.
Attachments
aoo96107_FilerXml_Xpath_demo1.ods
(19.92 KiB) Downloaded 87 times
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Converting XML Feed to CSV

Post by John_Ha »

Another simpler method is to put all the lines into a spreadsheet as below and use the spreadsheet sort.

Code: Select all

W3Schools Home Page
https://www.w3schools.com
Free web building tutorials

RSS Tutorial
https://www.w3schools.com/xml/xml_rss.asp
New RSS tutorial on W3Schools


XML Tutorial
https://www.w3schools.com/xml
New XML tutorial on W3Schools
Now delete all the blank rows. Adding a working column B which numbers all rows. Add another C which records if the line is blank or not. Sort on C then B. This places all blank rows together. Delete the blank rows. The remaining rows are still in the original order.
Clipboard01.gif
Now just move the 2nd cell A2 to B1 and the third cell A3 to C1. Repeat for all rows.

As always the quality of the result will depend on the quality of the data. This method assumes we always have three lines separated by one or more blank lines

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
MrProgrammer
Moderator
Posts: 4895
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Converting XML Feed to CSV

Post by MrProgrammer »

nguyenmm wrote:I have an 8mb xml feed that I'd like to convert to CSV.
Calc was not designed for XML. You are using the wrong tool so the task will be difficult, like driving screws with a hammer.
Villeroy wrote:A programmer would not even need Calc nor Excel.
A programmer would use XSLT, which was designed for XML manipulation. A non-programmer would use a tool like QXmlEdit.
Villeroy wrote:This requires at least 2 normalized tables for items and channels.
But we could use an unnormalized table. In this case the OP would want to filter the table by Key=Idescription.
201812041033.ods
(34.66 KiB) Downloaded 81 times
Also see [Solved] XML file to Calc columns.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the title. You can select the green checkmark icon at the same time.

[Tutorial] Ten concepts that every Calc user should know
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Converting XML Feed to CSV

Post by John_Ha »

MrProgrammer wrote:the task will be difficult, like driving screws with a hammer.
As long as the hammer is heavy enough that works quite well for me :roll:
As for spreading jam - I agree a knife performs better than a hammer.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply