Page 1 of 1

Plans for improved regex support for search/replace?

Posted: Thu Jan 15, 2015 10:39 pm
by TheJesster
So at work I found myself repeatedly copying large chunks of data from one app into Excel, then grabbing only the column of data I wanted to fix and copying that into a third app that would allow me to use a regex for a search and replace operation, then copying the fixed data back into Excel, and from there, the whole blob back into the original app.

So, having used OpenOffice for personal use at home, I wondered if Calc could save me a step by allowing me to do the regex search/replace directly without the need of the third app. A little searching showed that it did indeed support regex's for search/replace. Awesome. So I installed OO on my work machine, only to find that it only supports a very basic form of the regex standard. None of the special grouping features described on this page (http://www.regular-expressions.info/refadv.html) are supported. This is rather surprising as more complete regex libraries are available for most major languages and I would have thought OO would have just used one of them rather than implementing their own regex engine.

Are there any plans to improve the regex support in OO? Specifically, I'd like to use lookahead and lookbehind groups:
(?<=...) / (?=...)

Re: Plans for improved regex support for search/replace?

Posted: Fri Jan 16, 2015 12:27 am
by RusselB
We have no idea on any plans regarding OpenOffice. You can submit suggestions using the same method that is used for reporting bugs. See http://openoffice.apache.org/bug-tracking.html for more information.
Additionally there is more information about the regex that OpenOffice does allow at https://wiki.openoffice.org/wiki/Docume ... _in_Writer
While the address does specify Writer, the options available for Calc are also included.

Re: Plans for improved regex support for search/replace?

Posted: Sat Jan 17, 2015 9:51 am
by hanya
TheJesster wrote:Are there any plans to improve the regex support in OO? Specifically, I'd like to use lookahead and lookbehind groups:
(?<=...) / (?=...)
Since Apache OpenOffice 3.4 and LibreOffice 4.0, the regular expression engine is from ICU Regular Expressions [1].
You can use most of operators listed in the page except for operations restricted by office functions such as paragraph break. Look-ahead and look-behind operations are supported by the current engine.

[1]: http://userguide.icu-project.org/strings/regexp

Re: Plans for improved regex support for search/replace?

Posted: Sun Jan 18, 2015 3:48 am
by acknak
Note: the regex engine was switched to ICU with OO 4.0; earlier versions supported only the simpler patterns. The OO documentation and online help has not caught up with the change.

Re: Plans for improved regex support for search/replace?

Posted: Mon Jan 19, 2015 6:16 pm
by TheJesster
Ok, so I was going by the older documentation which didn't mention support for special groups.
However, that said, there seems to be a bug in Calc as it doesn't seem to support lookaheads or lookbehinds. Consider the following scenarios:

1: I have a bunch of strings that look something like this:
"Thing-1!!!/Thing-2!!!/Thing-3/Thing-4!!!"

I want to make sure every numbered Thing ends with three exclamation marks, and as you can see above, they are missing from Thing-3. So I attempt a regex Search/Replace like this:
Search for
(?<=-\d)/
Replace with
!!!/

So that should find all forward slashes that are immediately preceded by a hyphen and a single digit, then replace only the slash with three exclamation marks and a slash.

Tried this exact scenario in OpenOffice Writer and it worked. However using the exact same search string and regex in OpenOffice Calc, I get "Search key not found."

Scenario 2:

"Thing-1!!!/Thing-2!!!/Thing-3/Thing-4!!!/Thing-5"

In this case I have a Thing at the end of the string that's missing its exclamation points so it does not have a forward slash to replace. So in this situation I want a regex that will match no actual characters, but simply a location within the string to insert my exclamation points at, and I want it to work for both Thing-3 and Thing-5. So I tried this:

Search for
(?<=-\d)(?=[/"])
Replace with
!!!

With this I get a "Search key not found" in both Writer and Calc. However, this does work in Notepad++. I'm guessing this is simply a limitation of the ICU Regex engine that it won't match 0 characters at a specific location in the string.

Re: Plans for improved regex support for search/replace?

Posted: Mon Jan 19, 2015 7:16 pm
by acknak
TheJesster wrote:... So that should find all forward slashes that are immediately preceded by a hyphen and a single digit, then replace only the slash with three exclamation marks and a slash. ...
Your example seems to work fine for me.

Maybe you can attach a spreadsheet with some sample data that doesn't work for you?

Re: Plans for improved regex support for search/replace?

Posted: Mon Jan 19, 2015 8:43 pm
by TheJesster
Ok, this is driving me nuts, as I've got it working now, but can't pin down the circumstances that caused it to fail.

I have figured this much out, if you have only one cell selected and check the Current selection only box on the find/replace dialog, then try to do a replace all, you'll get a "Search key not found" and nothing will be replaced even if there is a match in the selected cell. This seems to happen for any search (not just regex). And this probably accounts for some of my failed attempts, but I'm sure I had it fail on me with multiple rows selected as well.

Incidentally, I am using the "Current selection only" option for all my attempts.