[Solved] Extract data from other random data

Discuss the spreadsheet application
Post Reply
Pikwic
Posts: 9
Joined: Mon May 31, 2021 5:46 pm

[Solved] Extract data from other random data

Post by Pikwic »

Hi,

I have random data like this:
{"WordOne":{"ferfe":"WordOne","mw":"6.565008|22.347848|6"},"gergerg":{"erger":"regergerg","id":"9460248|08|0|3|a5||"},"rferfe":{"erfe":"erferf","mw":"sdcsdc|sdcsd|dcsdcs|csdcsd"}}}
I try to extract only :
6.565008
and
22.347848
Let only the first and second data that follow
"WordOne","mw":"
and are separated by
|
Do you know how to do it ?
Last edited by Pikwic on Tue Jun 01, 2021 3:09 pm, edited 2 times in total.
OpenOffice 4.1.7 on MacOS 11.2
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: Extract data from other random data

Post by karolus »

Hallo

This is .json why dont you use some json-parser?

Code: Select all

#! /usr/bin/env python3

import json
content = json.loads('{"WordOne":{"ferfe":"WordOne","mw":"6.565008|22.347848|6"},'
                  '"gergerg":{"erger":"regergerg","id":"9460248|08|0|3|a5||"},'
                  '"rferfe":{"erfe":"erferf","mw":"sdcsdc|sdcsd|dcsdcs|csdcsd"}}')

a, b = content['WordOne']['mw'].split('|')[0:2]
print(f'{a}, {b}')
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Extract data from other random data

Post by Lupp »

Let me assume you want to extract the "first and second decimal number using the point as the decimal separator, containing no thousands separators and no spaces and having no exponential scaling factor".
What you want (if I'm right) is well described by the RegularExpression

Code: Select all

[\+\-]?\d+(\.\d+)?
If you actualy want a mandatory decimal point or in another way something slightly different from what I described, you can adapt the expression accordingly.
Using the above RegEx, the extractable numbers are
6.565008;22.347848;6;9460248;08;0;3;5 (Third and additional numbers for free.)
How did I get them?
You may look into the attached example, but you won't see the results using Apache OpenOffice (any version).
The function REGEX() used for the purpose only is implemented in LibreOffice Version 6.2 or higher.
aoo105376extractNumberFromArbitraryStrings.ods
(12.16 KiB) Downloaded 98 times
The RegEx

Code: Select all

(?<=("|\|))[\+\-]?\d+(\.\d+)?(?=("|\|))
should even more preciesly select what you described.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Pikwic
Posts: 9
Joined: Mon May 31, 2021 5:46 pm

Re: Extract data from other random data

Post by Pikwic »

Thanks for your answers.

@karolus, I prefer to use a regex, with find / replace.

@Lupp, thanks, but it would only take the first and second data following the expression :
"WordOne","mw":"
To take my example :
{"WordOne":{"ferfe":"WordOne","mw":"6.565008|22.347848|6"},"gergerg":{"erger":"regergerg","id":"9460248|08|0|3|a5||"},"rferfe":{"erfe":"erferf","mw":"sdcsdc|sdcsd|dcsdcs|csdcsd"}}}
Only :
6.565008;22.347848
I should be able to do this with find / replace.

Thanks !
OpenOffice 4.1.7 on MacOS 11.2
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Extract data from other random data

Post by Lupp »

Of course, you will need to adapt any RegEx to your specific use-case. Surely you wouldn't ask for automated support if there was only one example, and I wouldn't expect one example and a few words to actually tell everything needed to find a correct solution.

However, Find & Replace with a RegEx for the search and an appropriate replacement string would always work "in situ". With other words: It will destroy the original data. Therefore I prefer a solution based oin the REGEX() function.

If you insist on F&R, and you are sure your very strange problem is sufficiently explained (what I can't believe), you can use

Code: Select all

(^.*"WordOne","mw":")(\d+\.\d+)(\|)(\d+\.\d+)(.*$)
as the search string and

Code: Select all

$2;$4
as the replacement string.
I wouldn't suggest to use such RegEx related specialties without understanding them.
To play with examples, you should better use the REGEX() function.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: Extract data from other random data

Post by karolus »

Hallo

so you have one Problem more… https://xkcd.com/1171/
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Extract data from other random data

Post by Lupp »

:/)
Yes, Mr Munroe is in the know. See also
https://xkcd.com/353/
https://xkcd.com/1987/
And if we still need some explanation by those really in the know and in the humor.
https://www.explainxkcd.com/wiki/index.php/353:_Python

To be serious for another minute: You surely know that LibreOffice comes with a small and somehow raw programming language called (LibO-) Basic, and may Basic be poor as it is, it nonetheless has a Split function.
My reason not to offer a solution by user code (Basic) was threefold (with appendix):
(1) Where ready made tools exist, I try to avoid custom programming.
(2) The syntactical concept of "Take the first two substrings behind 'SomeLongString' and separated by '|'." didn't seem trustworthy to me (the final thing?).
(3)RegEx aren't exactly simple, but they are powerful. It pays to know a bit about them independent of what software implementing them you use. My preferred tutorial: https://www.regular-expressions.info/tutorial.html.

(4)(appendix) There's a contributor here who might get sick reading Basic code.
I attach the sheet I played with when considering the question and (now) your comments.
aoo105376extractNumberFromArbitraryStringsPlay.ods
(14.94 KiB) Downloaded 96 times
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Pikwic
Posts: 9
Joined: Mon May 31, 2021 5:46 pm

Re: Extract data from other random data

Post by Pikwic »

Thanks Lupp, it works !
OpenOffice 4.1.7 on MacOS 11.2
Post Reply