Page 1 of 1
OR sequence in Find & Replace regular expression
Posted: Wed Dec 19, 2007 1:24 pm
by huw
works to find the specified characters at the beginning of a cell. I'll call this expression
A.
Code: Select all
([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works to find certain other character combinations or locations. I'll call this expression
B
Combining them
A|
B thus
Code: Select all
^([:space:]|,|\.|[:lower:])|([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works fine, yet
B|
ACode: Select all
([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)|^([:space:]|,|\.|[:lower:])
does not! Expression
B is not evaluated - does anyone have an idea why?
Re: OR sequence in Find & Replace regular expression
Posted: Wed Dec 19, 2007 4:20 pm
by FPeters
Could you give an example string that is not working? It's rather hard to
make your way through these regexps.
-f
Re: OR sequence in Find & Replace regular expression
Posted: Wed Dec 19, 2007 5:05 pm
by huw
Sorry!
Expression
A selects any cell beginning with either a space, comma, period, or lowercase letter.
Expression
B selects any cell containing two consecutive spaces, OR a comma or period with an alphanumeric character directly each side, OR ending with a space, comma, or period.
A test set:
Code: Select all
E
,E
.E
ee
E e
E,e
E.e
E
E,
E.
End
Note the first "E" in that list should have a leading space, but it is being stripped by the forum software so you'll have to add it yourself.
All but "End" are selected by
A|
B, but the mystery is that
B|
A misses
A.
Remember to turn on Regular expressions and Case sensitivity.
Re: OR sequence in Find & Replace regular expression
Posted: Wed Dec 19, 2007 9:06 pm
by acknak
My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.
In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.
PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.
Re: OR sequence in Find & Replace regular expression
Posted: Thu Dec 20, 2007 11:21 am
by huw
acknak wrote:My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.
In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.
PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.
Thanks. It looks like another quirk of OOo's regex implementation.
I've only ever used regex in OOo, never in a respected implementation, so forgive me if I ask why lines #1 & #8 didn't match in Perl - #1 has a leading space that should be caught by expression
A, #8 has a trailing space that should be caught by expression
B.
Note only line #1 should have a leading space.
Re: OR sequence in Find & Replace regular expression
Posted: Thu Dec 20, 2007 7:08 pm
by acknak
... why lines #1 & #8 didn't match in Perl...
Because I'm an idiot, in a hurry
I forgot to fix the named classes in your pattern, so they weren't working properly, and I somehow lost the trailing space on #8.
Once I actually pay attention to what I'm doing, it works much better (only #11 does not match).
Just for kicks, here is what your pattern would look like in Perl, along with the output when run against your sample:
Code: Select all
#!/usr/bin/perl -n
chomp;
my $match =
/
[[:space:],.]$
|
^[[:space:],.[:lower:]]
|
[[:space:]]{2}
|
([[:alnum:]][.,][[:alnum:]])
/ox;
printf("%2d: %-8s %s\n", $., "'$_'", $match ? "matched <$`'$&'$'>" : "no match");
Code: Select all
$ perl abx e
1: ' E' matched <' 'E>
2: ',E' matched <','E>
3: '.E' matched <'.'E>
4: 'ee' matched <'e'e>
5: 'E e' matched <E' 'e>
6: 'E,e' matched <'E,e'>
7: 'E.e' matched <'E.e'>
8: 'E ' matched <E' '>
9: 'E,' matched <E','>
10: 'E.' matched <E'.'>
11: 'End' no match
Perhaps this will help make it clear why the named classes are meant to appear inside "[classes]" and not to stand on their own.
Re: OR sequence in Find & Replace regular expression
Posted: Fri Jan 04, 2008 12:52 pm
by huw