Does LO have a RegEx object?

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Does LO have a RegEx object?

Post by eeigor »

 Edit: Split from Calc cell characters because this is a new question and the first question is solved. 
@Villeroy, thank you. With that it is more or less clear.
Another question. Does LO have a regex object with its own methods and properties?
In Excel, I would write like this (demo code).

Code: Select all

Dim re, sPattern$, cell, oCellRange, i%
Dim matches  'match collection
<...>
Set re = CreateObject("VBScript.RegExp")  'uses the external lib
re.Pattern = sPattern
For Each cell in oCellRange
    Set matches = re.Execute(cell)
    ' Highlight all matches in the current cell in red.
    For i = 0 To matches.Count - 1
        If matches(i).Length > 0 Then
            With cell.Characters(matches(i).FirstIndex + 1, matches(i).Length)
                .Font.Color = vbRed
            End With
        End If
    Next i
Next cell
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Calc cell characters

Post by Villeroy »

You can use Python or Java instead of silly Basic.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Calc cell characters

Post by eeigor »

Well, thanks. Python module re can do much more than VBScript.RegExp. I will not close the topic yet. Let's wait a bit...
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Calc cell characters

Post by Lupp »

Why should somebody resort to

Code: Select all

re = CreateObject("VBScript.RegExp")  'uses the external lib
instead of simply applying the REGEX() function available in LibreOffice Calc since V 6.2?
Well, there is a little problem if the function needs to be called from user code with an omitted third parameter, but I surely would rather find a workaround in that case than go to VB Script.
REGEX() uses the same ICU engine as the F&R tool in LibO. Significant advantage!

Tell me your actual problem, and I'll look for a solution inside LibO.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Calc cell characters

Post by eeigor »

If you want to search within the string, then you need com.sun.star.util.TextSearch.

Code: Select all

Sub getMktValue()
  Dim oDoc as Object
  Dim oSheet as Object
  Dim oCell as Object

  oDoc = ThisComponent
  oSheet = oDoc.Sheets.getByName("Income")
  'regex test code'
  oCell = oSheet.getCellByPosition(0, 1)
  stk = oCell.getString()

  oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
  oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
  oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
  oOptions.searchString = "\((.*)\)"  'text string in parentheses
  oTextSearch.setOptions(oOptions)
  oFound = oTextSearch.searchForward(stk, 0, Len(stk))

  sFound = mid(stk, oFound.startOffset(0) + 1, oFound.endOffset(0) - oFound.startOffset(0))
  MsgBox sFound  'substring w/o parentheses

  sFound = mid(stk, oFound.startOffset(1) + 1, oFound.endOffset(1) - oFound.startOffset(1))
  MsgBox sFound
End Sub
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Calc cell characters

Post by Lupp »

Fine. This will often be better than to call REGEX with a FunctionAccess object.
You know a way. For what the VBScript thing?
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Calc cell characters

Post by eeigor »

Lupp wrote:Why should somebody resort to re = CreateObject("VBScript.RegExp")
@Lupp I have Linux. Excel VBA solution is converting under LO Calc.
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Calc cell characters

Post by Lupp »

eeigor wrote:I have Linux. Excel VBA solution is converting under LO Calc.
Didn't understand this.

Concerning what I suppose you actually want to achieve, the attached example is showing how I would do it.
Since it uses the REGEX() function, it cannot work in AOO or in LibO below version 6.2.
aooEeigorTask.ods
(16.69 KiB) Downloaded 324 times
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Calc cell characters

Post by Villeroy »

This Python macro seems to do the trick on the first sheet:

Code: Select all

import re

def highlight_Embraced():
    from com.sun.star.sheet.CellFlags import STRING
    regex = '\((.*?)\)'
    doc = XSCRIPTCONTEXT.getDocument()
    sh = doc.Sheets.getByIndex(0)
    rgtxt = sh.queryContentCells(STRING)
    dsc = sh.createSearchDescriptor()    
    dsc.SearchRegularExpression = True
    dsc.SearchType = 1
    dsc.setSearchString(regex)
    rgs = rgtxt.findAll(dsc)
#    doc.CurrentController.select(rgs)
    enum = rgs.Cells.createEnumeration()
    while enum.hasMoreElements():
        cell = enum.nextElement()
        txt = cell.getText()
        s = cell.getString()
        rgx = re.compile(regex)
        i = rgx.finditer(s)
        for m in i:
#            print(m.span(1))
            a = m.span(1)[0]
            b = m.span(1)[1] - a
            c = txt.createTextCursor()
            c.collapseToStart()
            c.goRight(a, False)
            c.goRight(b, True)
            c.CharColor = int('0xFF0000', 16)
#            print(c.getString())
First I query all cell ranges having constant strings since we do not want to format any formulas (=> XSheetCellRanges).
Within the string cells I find all cells having any pair of braces (=> XSheetCellRanges). Undocumented dsc.SearchType = 1 seems to search within values instead of formulas which would obsolete queryContentCells, but I keep it anyway in this example code. It is a very fast method to boil down the amount of cells.
Within the found sheet cell ranges I enumerate every single cell and do the regex matching based on the precompiled, non-greedy expression \((.*?)\) and query the iteration of all matches within the cell string.
The inner braces of the expression denote a group, the first group.
n.span(1) returns a pair of 2 numbers (start, end) denoting the match positions of the first group.
a=n.span(1)[0] is the start
b=n.span(1)[1] -a is the char count
At this point you have to create a new text cursor for every match, collapseToStart, jump a chars to the right and expand another b chars to the right in order to get the wanted text range.

P.S.: Now the formatted cells could be queried with queryContentCells(com.sun.star.sheet.CellFlags.FORMATTED) which is a subset of c.s.s.CellFlags.STRING. Formatted string cells are those having formatted text portions, hyperlink fields or line breaks.
 Edit: Added an installer document with a paramterized version of the above code. Download the document to a trusted directory, open it and push the button to install the Python module. 
Attachments
highlightSubstrings.odt
(24.88 KiB) Downloaded 326 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Calc cell characters

Post by Lupp »

Have difficulties to see the problem.
Took the time to make a more realistic example with 3001 text-cells, and having a preset regex efficiently searching for content enclosed within unnested parentheses, The behaviour of preset attributes is also demonstrated in an example.

Nothing is diffcult or requires external means. The API services, the LibO Calc function REGEX(), and a tiny bit of "silly" Basic do it all.
aooEeigorTaskRealistic.ods
(77.36 KiB) Downloaded 354 times
Yes. Silly Basic has a short and wide bridge to the API. No elaborate functionality of any "professional" programming system is required. The many shortcomings and doubtable peculiarities of Basic simply don't matter.
In simple cases of the kind Basic may be used as the born reference language for programming in LibO.

There is one assurance needed: The so-called NUL character Chr(0) must not occur in any of the processed texts.
If you cannot assure this you may replace it where I use it ba another non-character (like VT?). The RegEx search may not accept them for the "any-point", but can search fo them if explicitly inserted into a workstring.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Calc cell characters

Post by Villeroy »

Well, I download your example clear the formattings and push the button but nothing happens.
Then I call pyCalc -> highlightSubstring -> highlight_Embraced_ActiveSheet and the formatting is ready in a fraction of a second.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Calc cell characters

Post by Lupp »

Very strange. For me the routine ran flawlessly, and also in a fraction of a second.
Did you notice that (in my case) the range of application is the current selection?
...
Well. Just tried it again, and it may have needed "two fractions of a second" with A:C selected.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Calc cell characters

Post by Villeroy »

Ah, OK. Current selection. Let's go.

BASIC runtime error.
An exception occurred
Type: com.sun.star.container.NoSuchElementException
Message: .

Line 14 on first module with callFunction("REGEX", ...)

Oh, I see. LO 6.0 is not new enough.

My 25 lines of Python code run with AOO (Python 2.7) and with any version of LO
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Calc cell characters

Post by eeigor »

I was offline. Many thanks to everyone. This material is quite enough. But I am still working with it. And I need time to figure out the code.
I hope the idea itself was not entirely useless.
REGEX: MATCH, REPLACE... lacked SELECT (has already) :idea: SPLIT?
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
nickGiard
Posts: 16
Joined: Fri Nov 10, 2017 5:57 pm

Re: Does LO have a RegEx object?

Post by nickGiard »

Very interesting service com.sun.star.util.TextSearch and his SearchOptions.
For me oTextSearch.searchForward(sStr, 0, Len(sStr)) need of a String, I suppose a Basic string out of Cell.String o other strings.
But what on SearchOptions.searchString ?? How about regular expressions ? In str "aa 212 bb c 31.5" how find all words, and all numbers ?? How flower of re syntax I must use in Basic and in API TextSearch ?? Please give me a link !!
Please submit examples
str "aa 212 bb c 31.5 Auto 3mà "
how searchString for characters, for numbers, for spaces, ...
Thank you in advance.

Obviously I can use Python and re module, bat I would like use OO API
LibreOffice 6.3 on Windows 10 64bit
User avatar
karolus
Volunteer
Posts: 1159
Joined: Sat Jul 02, 2011 9:47 am

Re: Does LO have a RegEx object?

Post by karolus »

nickGiard wrote: Obviously I can use Python and re module, bat I would like use OO API
WHY??
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Does LO have a RegEx object?

Post by eeigor »

@nickGiard: Your question is correct.
oTextSearch.searchForward(sStr, nStart, Len(sStr))

This is an alternative to the REGEX function. For the given regexp, you must perform a search in a loop, remembering the start and end positions of each occurrence (match). Knowing the positions of the match, it can be extracted from the original string.
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Does LO have a RegEx object?

Post by Lupp »

There isn't a "RegEx object". But...
A TextSearch service knows the method .setOptions(optionsStructure).
setOptions2() may also work .
What you can directly pass to the .searchForward() or .searchBackward() only is the starting offset and the length of the substring to search. Everything else (the given string, the searchString, a probable replaceString, the algorithm to use ...) must be passed via the options structure. The SearchOptions2 structure also supports wildcards now. Be careful. The value enumerations for the two currently available versions of searchOptions/2 are different.

That's not exactly fun. The way used here to pass parameters to an API objcet reminds me of the "buffers" in Win 3 times. Terrible! Should be replaced for decades now. Due to the now long history of DivisonStarOffice, SunStarOffice, OpenOffice.org... there are inconsitstencies. It's like in real life.

Anyway, the TextSearch service may be very useful in special cases, because the search results when using RegEx give access to the groups inside the overall finding. It may not be easy to handle.
Another funny thing I noticed recently: A TextSearch service having already set an options structure succesfully will show Null for the structure in the IDE nonetheless.

Start with https://api.libreoffice.org/docs/idl/re ... earch.html to find the details.
Last edited by Lupp on Sat Aug 07, 2021 10:38 pm, edited 1 time in total.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
nickGiard
Posts: 16
Joined: Fri Nov 10, 2017 5:57 pm

Re: Does LO have a RegEx object?

Post by nickGiard »

Hi, Karolus, don't you know that for programmers the hardest tings are more interesting ??? With Python is so much simpler ...., and if OO has new API libraries, why don't use them (if are usable) ?
Thank you eeigor, you have gave me an important information for a search in a loop, but how about searchString ?? do you have examples?
Thank you in advance.
LibreOffice 6.3 on Windows 10 64bit
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Does LO have a RegEx object?

Post by eeigor »

eeigor wrote:
nickGiard wrote:Thank you eeigor, you have gave me an important information for a search in a loop, but how about searchString ?? do you have examples?
This work was done with the support of @Lupp (thanks to him too).

Something like that (the vMatchOffsets array will contain everything you are looking for). Highlights

Code: Select all

	oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
	oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
	oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
	oOptions.searchString = sPattern  'your regex goes here
	oTextSearch.setOptions(oOptions)

	Dim vMatchOffsets()
	nCellFlags = com.sun.star.sheet.CellFlags.STRING

	For Each oRange In oRanges
		For Each oCell In oRange.queryContentCells(nCellFlags).Cells
			'With com.sun.star.table.CellContentType
				'If oCell.Type = .TEXT Then
				sText = oCell.String
				'End If
			'End With
			nTotalMatchCount = nTotalMatchCount _
			 + RegExSearch(sText, oTextSearch, nOccurrence, vMatchOffsets)
		Next oCell
	Next oRange

Code: Select all

'	sText:
'		A string where the regular expression is to be applied.
'	oTextSearch:
'		An object to search in its content.
'	nOccurrence:
'		Number to indicate which match of regex pattern is to be found (if 0 find all matches).
'	vMatchOffsets [ByRef]: 
'		An array of arrays that will be contain the start/end offsets (positions)
'		of each matching substring.
'	Returnes: vMatchOffsets() and the number of matching substrings.
Function RegExSearch&(sText$, oTextSearch As Object, nOccurrence%, ByRef vMatchOffsets)
	Dim aSearchResult As Object  'com.sun.star.util.SearchResult
	Dim i%, bFound As Boolean

	vMatchOffsets = Empty

	' NOTE: It's experimentally revealed that the searchForward method skips
	'		zero-length strings (⁇), e.g. find "x*" in "abc" -> No match found.
	aSearchResult = oTextSearch.searchForward(sText, 0, Len(sText))
	Do While aSearchResult.subRegExpressions > 0
		With aSearchResult
		'	Dim s$  'string that was found
		'	s = Mid(sText, .startOffset(0) + 1, .endOffset(0) - .startOffset(0))
		'	Print s; .subRegExpressions; .startoffset(0); .endOffset(0)  'full match only

			If nOccurrence > 0 Then
				' Check if nOccurrence is equal to the specified match number.
				If i = nOccurrence - 1 Then
					ReDim vMatchOffsets(0)
					vMatchOffsets(0) = Array(.startOffset(0), .endOffset(0))
					bFound = True
					Exit Do
				End If
			End If

			ReDim Preserve vMatchOffsets(i)
			vMatchOffsets(i) = Array(.startOffset(0), .endOffset(0))
			i = i + 1
			If .endOffset(0) = Len(sText) Then Exit Do

			aSearchResult = oTextSearch.searchForward(sText, .endOffset(0), Len(sText))
		End With
	Loop

	If nOccurrence > 0 And Not bFound Then
		vMatchOffsets = Empty
	End If

	If Not IsEmpty(vMatchOffsets) Then
		RegExSearch = UBound(vMatchOffsets) + 1
	End If
End Function
The use of this function can be very diverse, including expanding the capabilities of the regular REGEX function (which, by the way, only works in the LO 6.2+ version).
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
nickGiard
Posts: 16
Joined: Fri Nov 10, 2017 5:57 pm

Re: Does LO have a RegEx object?

Post by nickGiard »

Thank you eeigor and @Lupp very much, this is a very good and general macro, and works on a range were for every cell contains String you apply oTextSearch with his sPattern and save the result in Array. The TOP would be to have a REM with various sPattern for example, because I don't know where look for.
For me the problem is very more simple: i have a string (obviously from cell.String o for me cell.Formula) and I want read the first integer Row of the first cell address in the formula (es. = "[€ " & TESTO(SOMMA(O7:O30); "#.##0,00" ) & " ]". Now I use re Python with pattern=r'\d+', but I'd like to use API of OO.
Thank you for your effort !!!
LibreOffice 6.3 on Windows 10 64bit
eeigor
Posts: 214
Joined: Sun Apr 12, 2020 10:56 pm

Re: Does LO have a RegEx object?

Post by eeigor »

NOTE: In the REGEX function if Occurrence is 0, Text is returned unmodified. But in the RegExSearch UDF if nOccurrence is 0, all matches’ positions will be found. Alas, there are no RegEx object (in LO) & its appropriate methods to do this job for us.
Ubuntu 18.04 LTS • LibreOffice 7.5.3.2 Community
Post Reply