Plans for improved regex support for search/replace?

Request For Enhancement, User feedback candidates for wiki
Post Reply
TheJesster
Posts: 3
Joined: Thu Jan 15, 2015 10:21 pm

Plans for improved regex support for search/replace?

Post by TheJesster »

So at work I found myself repeatedly copying large chunks of data from one app into Excel, then grabbing only the column of data I wanted to fix and copying that into a third app that would allow me to use a regex for a search and replace operation, then copying the fixed data back into Excel, and from there, the whole blob back into the original app.

So, having used OpenOffice for personal use at home, I wondered if Calc could save me a step by allowing me to do the regex search/replace directly without the need of the third app. A little searching showed that it did indeed support regex's for search/replace. Awesome. So I installed OO on my work machine, only to find that it only supports a very basic form of the regex standard. None of the special grouping features described on this page (http://www.regular-expressions.info/refadv.html) are supported. This is rather surprising as more complete regex libraries are available for most major languages and I would have thought OO would have just used one of them rather than implementing their own regex engine.

Are there any plans to improve the regex support in OO? Specifically, I'd like to use lookahead and lookbehind groups:
(?<=...) / (?=...)
OpenOffice 4.1.1 on Windows 7
User avatar
RusselB
Moderator
Posts: 6646
Joined: Fri Jan 03, 2014 7:31 am
Location: Sarnia, ON

Re: Plans for improved regex support for search/replace?

Post by RusselB »

We have no idea on any plans regarding OpenOffice. You can submit suggestions using the same method that is used for reporting bugs. See http://openoffice.apache.org/bug-tracking.html for more information.
Additionally there is more information about the regex that OpenOffice does allow at https://wiki.openoffice.org/wiki/Docume ... _in_Writer
While the address does specify Writer, the options available for Calc are also included.
OpenOffice 4.1.7, LibreOffice 7.0.1.2 on Windows 7 Pro, Ultimate & Windows 10 Home (2004)
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
hanya
Volunteer
Posts: 885
Joined: Fri Nov 23, 2007 9:27 am
Location: Japan

Re: Plans for improved regex support for search/replace?

Post by hanya »

TheJesster wrote:Are there any plans to improve the regex support in OO? Specifically, I'd like to use lookahead and lookbehind groups:
(?<=...) / (?=...)
Since Apache OpenOffice 3.4 and LibreOffice 4.0, the regular expression engine is from ICU Regular Expressions [1].
You can use most of operators listed in the page except for operations restricted by office functions such as paragraph break. Look-ahead and look-behind operations are supported by the current engine.

[1]: http://userguide.icu-project.org/strings/regexp
Please, edit this thread's initial post and add "[Solved]" to the subject line if your problem has been solved.
Apache OpenOffice 4-dev on Xubuntu 14.04
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Plans for improved regex support for search/replace?

Post by acknak »

Note: the regex engine was switched to ICU with OO 4.0; earlier versions supported only the simpler patterns. The OO documentation and online help has not caught up with the change.
AOO4/LO5 • Linux • Fedora 23
TheJesster
Posts: 3
Joined: Thu Jan 15, 2015 10:21 pm

Re: Plans for improved regex support for search/replace?

Post by TheJesster »

Ok, so I was going by the older documentation which didn't mention support for special groups.
However, that said, there seems to be a bug in Calc as it doesn't seem to support lookaheads or lookbehinds. Consider the following scenarios:

1: I have a bunch of strings that look something like this:
"Thing-1!!!/Thing-2!!!/Thing-3/Thing-4!!!"

I want to make sure every numbered Thing ends with three exclamation marks, and as you can see above, they are missing from Thing-3. So I attempt a regex Search/Replace like this:
Search for
(?<=-\d)/
Replace with
!!!/

So that should find all forward slashes that are immediately preceded by a hyphen and a single digit, then replace only the slash with three exclamation marks and a slash.

Tried this exact scenario in OpenOffice Writer and it worked. However using the exact same search string and regex in OpenOffice Calc, I get "Search key not found."

Scenario 2:

"Thing-1!!!/Thing-2!!!/Thing-3/Thing-4!!!/Thing-5"

In this case I have a Thing at the end of the string that's missing its exclamation points so it does not have a forward slash to replace. So in this situation I want a regex that will match no actual characters, but simply a location within the string to insert my exclamation points at, and I want it to work for both Thing-3 and Thing-5. So I tried this:

Search for
(?<=-\d)(?=[/"])
Replace with
!!!

With this I get a "Search key not found" in both Writer and Calc. However, this does work in Notepad++. I'm guessing this is simply a limitation of the ICU Regex engine that it won't match 0 characters at a specific location in the string.
OpenOffice 4.1.1 on Windows 7
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Plans for improved regex support for search/replace?

Post by acknak »

TheJesster wrote:... So that should find all forward slashes that are immediately preceded by a hyphen and a single digit, then replace only the slash with three exclamation marks and a slash. ...
Your example seems to work fine for me.

Maybe you can attach a spreadsheet with some sample data that doesn't work for you?
AOO4/LO5 • Linux • Fedora 23
TheJesster
Posts: 3
Joined: Thu Jan 15, 2015 10:21 pm

Re: Plans for improved regex support for search/replace?

Post by TheJesster »

Ok, this is driving me nuts, as I've got it working now, but can't pin down the circumstances that caused it to fail.

I have figured this much out, if you have only one cell selected and check the Current selection only box on the find/replace dialog, then try to do a replace all, you'll get a "Search key not found" and nothing will be replaced even if there is a match in the selected cell. This seems to happen for any search (not just regex). And this probably accounts for some of my failed attempts, but I'm sure I had it fail on me with multiple rows selected as well.

Incidentally, I am using the "Current selection only" option for all my attempts.
OpenOffice 4.1.1 on Windows 7
Post Reply