OR sequence in Find & Replace regular expression

Discuss the spreadsheet application

OR sequence in Find & Replace regular expression

Postby huw » Wed Dec 19, 2007 1:24 pm

Code: Select all   Expand viewCollapse view
^([:space:]|,|\.|[:lower:])
works to find the specified characters at the beginning of a cell. I'll call this expression A.

Code: Select all   Expand viewCollapse view
([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works to find certain other character combinations or locations. I'll call this expression B

Combining them A|B thus
Code: Select all   Expand viewCollapse view
^([:space:]|,|\.|[:lower:])|([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works fine, yet B|A
Code: Select all   Expand viewCollapse view
([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)|^([:space:]|,|\.|[:lower:])
does not! Expression B is not evaluated - does anyone have an idea why?
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Postby FPeters » Wed Dec 19, 2007 4:20 pm

Could you give an example string that is not working? It's rather hard to
make your way through these regexps.

-f
User avatar
FPeters
Volunteer
 
Posts: 20
Joined: Sun Oct 07, 2007 9:28 pm
Location: Hamburg

Re: OR sequence in Find & Replace regular expression

Postby huw » Wed Dec 19, 2007 5:05 pm

Sorry!

Expression A selects any cell beginning with either a space, comma, period, or lowercase letter.

Expression B selects any cell containing two consecutive spaces, OR a comma or period with an alphanumeric character directly each side, OR ending with a space, comma, or period.

A test set:

Code: Select all   Expand viewCollapse view
E
,E
.E
ee
E  e
E,e
E.e
E
E,
E.
End


Note the first "E" in that list should have a leading space, but it is being stripped by the forum software so you'll have to add it yourself.

All but "End" are selected by A|B, but the mystery is that B|A misses A.

Remember to turn on Regular expressions and Case sensitivity.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Postby acknak » Wed Dec 19, 2007 9:06 pm

My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.

In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.

PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: OR sequence in Find & Replace regular expression

Postby huw » Thu Dec 20, 2007 11:21 am

acknak wrote:My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.

In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.

PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.

Thanks. It looks like another quirk of OOo's regex implementation.

I've only ever used regex in OOo, never in a respected implementation, so forgive me if I ask why lines #1 & #8 didn't match in Perl - #1 has a leading space that should be caught by expression A, #8 has a trailing space that should be caught by expression B.

Note only line #1 should have a leading space.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Postby acknak » Thu Dec 20, 2007 7:08 pm

... why lines #1 & #8 didn't match in Perl...

Because I'm an idiot, in a hurry ;-)

I forgot to fix the named classes in your pattern, so they weren't working properly, and I somehow lost the trailing space on #8.

Once I actually pay attention to what I'm doing, it works much better (only #11 does not match).

Just for kicks, here is what your pattern would look like in Perl, along with the output when run against your sample:
Code: Select all   Expand viewCollapse view
#!/usr/bin/perl -n

chomp;

my $match =
/
   [[:space:],.]$
|
  ^[[:space:],.[:lower:]]
|
   [[:space:]]{2}
|
  ([[:alnum:]][.,][[:alnum:]])
/ox;

printf("%2d: %-8s %s\n", $., "'$_'", $match ? "matched <$`'$&'$'>" : "no match");

Code: Select all   Expand viewCollapse view
$ perl abx e
1: ' E'     matched <' 'E>
2: ',E'     matched <','E>
3: '.E'     matched <'.'E>
4: 'ee'     matched <'e'e>
5: 'E  e'   matched <E'  'e>
6: 'E,e'    matched <'E,e'>
7: 'E.e'    matched <'E.e'>
8: 'E '     matched <E' '>
9: 'E,'     matched <E','>
10: 'E.'     matched <E'.'>
11: 'End'    no match

Perhaps this will help make it clear why the named classes are meant to appear inside "[classes]" and not to stand on their own.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: OR sequence in Find & Replace regular expression

Postby huw » Fri Jan 04, 2008 12:52 pm

This is now issue 84828.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm


Return to Calc

Who is online

Users browsing this forum: Bill and 21 guests