[Solved] "Find regexp substring" function

Discuss the spreadsheet application

[Solved] "Find regexp substring" function

Postby mclaudt » Tue May 11, 2010 11:19 pm

In Excel there is no direct regexp-related cell function, so if you want to get some regexp substring from, for example, cell A1, you should write your own function in VB:

Code: Select all   Expand viewCollapse view
Function ReFind(FindIn, FindWhat As String, Optional IgnoreCase As Boolean = False)
    Dim i As Long
    Dim matchCount As Integer
    Dim RE As Object, allMatches As Object, aMatch As Object
    Set RE = CreateObject("vbscript.regexp")
    RE.Pattern = FindWhat
    RE.IgnoreCase = IgnoreCase
    RE.Global = True
    Set allMatches = RE.Execute(FindIn)
    matchCount = allMatches.Count
    If matchCount >= 1 Then
        ReDim rslt(0 To allMatches.Count - 1)
        For i = 0 To allMatches.Count - 1
            rslt(i) = allMatches(i).Value
        Next i
        ReFind = rslt
    Else
        ReFind = ""
    End If
End Function


and then assign your cell =ReFind(A1;"^\d\d")

In OO Basic I didn't find any information concerning regexp find/replace function, there is only one mention about global find/replace method, accessible via menu.

What is the simplest way to get regexp substring from some cell in OOCalc?
Last edited by mclaudt on Thu May 13, 2010 12:29 am, edited 1 time in total.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mriosv » Tue May 11, 2010 11:22 pm

Menu/Help/OOo help- Index: regular expressions
LibreOffice 3.5.4 AOo-3.4 on Win 7 Ultimate
mriosv
Volunteer
 
Posts: 651
Joined: Mon Mar 09, 2009 1:12 am
Location: Galiza (España)

Re: "find regexp substring" function

Postby mclaudt » Tue May 11, 2010 11:35 pm

Menu/Help/OOo help- Index: regular expressions


ORLY?? Thanks a lot :bravo:

I'm going to use regexp manipulation within a cell, so regexp-oriented function such a MATCH LOOKUP SEARCH don't help.

In A1 I have "Giant Black Monolith h=50, w=20" and I want put in A2 something like =MySuperFunction(A1;"h=\d\d") and get "h=50".
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mriosv » Tue May 11, 2010 11:58 pm

SEARCH() support regular expressions, then you can use to extract the substring with MID()
=MID(A1;SEARCH("h=[:digit:]{2}";A1;1) ;4)
LibreOffice 3.5.4 AOo-3.4 on Win 7 Ultimate
mriosv
Volunteer
 
Posts: 651
Joined: Mon Mar 09, 2009 1:12 am
Location: Galiza (España)

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 12:06 am

mriosv wrote:SEARCH() support regular expressions, then you can use to extract the substring with MID()
=MID(A1;SEARCH("h=[:digit:]{2}";A1;1) ;4)


Yes, but in this example I have to know exactly the length of regexp substring! In case of "h[:space:]*=[:space:]*\d\d" I don't know this length cause it is not fixed, it can be "h=50" and "h = 50".

This task seems to be quite obvious, hope there is no need to learn Python or JavaScript scripting in OO to do that.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 12:12 am

OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mriosv » Wed May 12, 2010 2:18 am

Find enclosed a solution using array formulas.

A1: "Giant Black Monolith h = 50, w=20"

String searched: "h[:space:]*=[:space:]*[:digit:]{2}"

A2: minimun length of searched string
A2: =MIN(IF(ISNUMBER(SEARCH("h[:space:]*=[:space:]*[:digit:]{2}";LEFT(MID(A1;SEARCH("h[:space:]*=[:space:]*[:digit:]{2}";A1;1);999);ROW(B1:B999));1));ROW(B1:B999);999))

A3: shorter searched string
A3: =MID(A1;SEARCH("h[:space:]*=[:space:]*[:digit:]{2}";A1;1);MIN(IF(ISNUMBER(SEARCH("h[:space:]*=[:space:]*[:digit:]{2}";LEFT(MID(A1;SEARCH("h[:space:]*=[:space:]*[:digit:]{2}";A1;1);999);ROW(B1:B999));1));ROW(B1:B999);999)))

A2 and A3 are arrays, so to introduce use Ctrl+Shift+Enter.
Attachments
SearchedStringWithRegularExpressions.ods
(9.57 KiB) Downloaded 424 times
LibreOffice 3.5.4 AOo-3.4 on Win 7 Ultimate
mriosv
Volunteer
 
Posts: 651
Joined: Mon Mar 09, 2009 1:12 am
Location: Galiza (España)

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 6:32 am

mriosv wrote:Find enclosed a solution using array formulas.


Thanks, you have really monstrous skills :) It works fine for this fixed example.

But I'm still finding global solution for this task. Cause there is giant overhead, instead of direct addressing to the length of substring found, we have to produce 1000 similar calculation to define that length :crazy:

And also If I want use lookahead and lookbehind in regexp, this trick of defining the length simply will not work.

There must be a general solution, without constructing such a crutch.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 6:52 am

Here is the same question, without any success

http://www.oooforum.org/forum/viewtopic.phtml?t=91213
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby acknak » Wed May 12, 2010 5:14 pm

It might help if you gave us a sample data set to work with, showing the input data and what you need to get out of it.

It's often the case that a specific problem is much easier than solving the general one. For example, if the substring always falls at the beginning or end of the input string.

Also, are you sure that OOo BASIC has no regular expression search function(s)? I seem to remember that it does, but it's been a long time since I looked at it.

If BASIC doesn't support regexps, Python certainly does.

You probably should ask in the programming area of this forum for help with either of those.

This sort of problem would be very easy if only SUBSTITUTE supported regular expressions. Why the OOo developers went out of their way to add regexps to the search functions and never did it for SUBSTITUTE, I can only guess. Here's a request to add regexp support: Issue 106099: support regular expressions (regexps) in SUBSTITUTE function, maybe some more votes will get the developers to at least open the issue. You can register there and add your vote (up to two) or comment.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 6:04 pm

Thanks for your help, I voted for that issue and I also аррeаl folks for voting here.


acknak wrote:It might help if you gave us a sample data set to work with, showing the input data and what you need to get out of it.

This is a column of semi-unstructured definitions such as

Code: Select all   Expand viewCollapse view
Giant Black Monolith h=50, w=20
Small Green Monolith h=10-11,5, w=2.0
Monolith Pro (H=50); w=20~25
Medium Black Monolith w=20


and I want to get accurate table with 'Name', 'Height', 'Width' headers, so there is no way except using all power of regexp without monstrous detours.

Now I'm looking for user-defined functions with Python. But, anyway, any comments are welcome.
Last edited by mclaudt on Wed May 12, 2010 6:22 pm, edited 1 time in total.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby mclaudt » Wed May 12, 2010 6:20 pm

acknak wrote:This sort of problem would be very easy if only SUBSTITUTE supported regular expressions.


By the way, even SUBSTITUTE support for regexp will not solve the problem.
It would be nice if simply SUBSTRING (it supports regexp today!) returns substring found, not silly character position that is totally unusable!
Last edited by mclaudt on Wed May 12, 2010 7:30 pm, edited 1 time in total.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: "find regexp substring" function

Postby acknak » Wed May 12, 2010 7:04 pm

By the way, even SUBSTITUTE support for regexp will not solve the problem.

I expect problems do exist that it wouldn't fix, but I haven't seen one yet.

=SUBSTITUTE(A1; ".*([hH]=[0-9][0-9]*).*"; "$1") would give you the height values (if SUBSTITUTE supported regexps!)
=SUBSTITUTE(A1; ".*([wW]=[0-9][0-9]*).*"; "$1") would give you the widths

Obviously, you could do that right now in a few clicks using the find/replace dialog; there's just no way to do it as a formula.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: "find regexp substring" function

Postby mclaudt » Thu May 13, 2010 12:27 am

It seems that I find solution.

Code: Select all   Expand viewCollapse view
Function regex(a,b,c)

' Attention - made by novice, so can contain bugs

' a - string or cell to search in
' b - regexp string or cell containing regexp string
' c - back-reference number - analogy to \n in regexp syntax

 
' prepare regexp search options
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = b
oTextSearch.setOptions(oOptions)

' search first substring
oFound = oTextSearch.searchForward(a, 0, Len(a))

If  oFound.subRegExpressions=0 then
  regex = "No result in that back-reference level"
  Exit Function
Else
  nStart = oFound.startOffset()
  nEnd = oFound.endOffset()
  regex = Mid(a, nStart(c) + 1, nEnd(c) - nStart(c))
End If

End Function


Based on code from this this thread
See also http://api.openoffice.org/docs/common/r ... esult.html
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm

Re: [Solved]"find regexp substring" function

Postby mclaudt » Thu May 13, 2010 1:05 am

Noticed that this script works about 3-4 times slower than VBScript analog. It it possible that there is also some overhead.
OpenOffice 3.1 on openSUSE 11.0
mclaudt
 
Posts: 13
Joined: Tue May 11, 2010 11:15 pm


Return to Calc

Who is online

Users browsing this forum: No registered users and 16 guests