Regular expressions and empty paragraph

Discuss the word processor
Post Reply
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Regular expressions and empty paragraph

Post by franklekens »

I'm trying to find my way in the somewhat peculiar regular expressions of Writer's find dialogue. I gather I can find an empty paragraph by searching for $. But is it totally impossible to search for a comination of an empty paragraph with something else?

Specifically, I'd like to search for empty paragraphs followed by a tab (\t). These are in my text, but "find" refuses to find them.
(Currently *all* my paragraphs start with a tab, and I'd like to remove only the tabs following on an empty paragraphs, so that only paragraphs from the second onward are indented, and never the first paragraph after a blank line. And yes, I know indenting paragraphs by means of a tab sign is fairly primitive, it's better to use paragraph styling for that, but for reasons beyond my control I can't make use of that option right now. Some clients prefer it this way.)

Many thanks for any useful tip.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
RoryOF
Moderator
Posts: 34612
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: regular expressions and empty paragraph

Post by RoryOF »

Look at the extension AltSearch (Alternative dialog Find & Replace for Writer); I don't know if it will solve your problem, but it offers far more facilities than OOo's Find and Replace.
http://extensions.services.openoffice.o ... /AltSearch
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
floris v
Volunteer
Posts: 4430
Joined: Wed Nov 28, 2007 1:21 pm
Location: Netherlands

Re: Regular expressions and empty paragraph

Post by floris v »

I gather I can find an empty paragraph by searching for $.
That's not correct. $ will find the end of a paragraph but not necessarily of a paragraph without any text. For that use ^$. OOo Find doesn't search beyond paragraph endings, use AltSearch for that, as RoryOF already pointed out.
OpenOffice 4.1.11 on Ubuntu; LibreOffice 6.4 on Linux Mint, LibreOffice 7.6.2.1 on Ubuntu
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Regular expressions and empty paragraph

Post by JohnV »

Currently *all* my paragraphs start with a tab
Search for otherwise empty ones = ^\t$
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

Thanks for the tip, an interesting extension.
But Alternative Search too doesn't seem to find any ^$\t when clearly there are many such instances.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
RoryOF
Moderator
Posts: 34612
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Regular expressions and empty paragraph

Post by RoryOF »

Are you sure these are empty _Paragraphs_? If you Turn on / View / nonprinting characters, they should be marked with backwards P symbol. If they have a left pointing hooked arrow they need a different search term.

/View / Nonprinting Characters or the equivalent backwards P button on toolbar are toggles.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

Yes, I know about SHIFT+ENTER. They're really hard returns, i.e. empty paragraphs. It doesn't seem to work, and if it should I can't see what I'm doing wrong.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Regular expressions and empty paragraph

Post by JohnV »

You said *all* your paragraphs started with a tab. Does that include that include the "empty" ones?
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

Good one, I hadn't thought of that.
But when I look: no, the empty paragraphs are really empty. Every paragraph containing text starts with a TAB.

By now it's not really a problem that has to be solved for this text. I have to go through the text manually to edit it anyway, so I'll remove the superfluous tabs as I go along.

It just bugs me that none of the search options allows me to search for ^$\t. It's not such a wild combination, is it?
Can other people make it work?
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
RoryOF
Moderator
Posts: 34612
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Regular expressions and empty paragraph

Post by RoryOF »

If there are no or few other tabs in the file, search for \t to remove the tabs and ^$ for the empty paragraphs. If there are a few other tabs, you might have to replace these beforehand with a marker (I usually use %%%%) and subsequently replace the marker with a tab using Find and Replace.

Try this:

Search for ^\t Replace with nothing.
Search for ^$ Replace with nothing
.
 Edit: Won't work correctly: OKay, getting it now:

Search for ^$ Replace with %%%%
Search for %%%%\t Replace with nothing.
In case there were any empty paras not followed by tabs, now Search for %%%% Replace with nothing.

That should do the job! 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

Intermediate step with a dummy replacement -- of course. Done that before, simply didn't think of it now. Silly.

Thanks for the tip.
Still annoying that search won't find this. It's illogical, captain.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

Scrap that. It doesn't work but something else does.
To recapitulate: I wanted to keep all tabs in front of paragraphs, except for the tabs that occurred immediately after an empty paragraph (blank line).
Replacing empty paragraphs by %%%%, and then getting rid of %%%%\t works. But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason. They ended up as ^$ signs in the text.

But I see Alternative Search has another feature: you don't have to use ^$ for empty paragraphs, you can just search for hard returns. \p is a hard return, \p\p is an empty paragraph (obviously).

So simply replacing \p\p\t by \p\p seems to do the trick.
I haven't manually checked my entire text for undesired results, but I think it works.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
franklekens
Posts: 133
Joined: Thu Oct 30, 2008 11:05 am
Location: Amsterdam

Re: Regular expressions and empty paragraph

Post by franklekens »

No, it does have undesired effects. Weird things happen, I'm going to quit this nonsense.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
User avatar
RoryOF
Moderator
Posts: 34612
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Regular expressions and empty paragraph

Post by RoryOF »

franklekens wrote:But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason.
You can replace %%%% by \n instead, as in the replace field \n gives a paragraph mark.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
floris v
Volunteer
Posts: 4430
Joined: Wed Nov 28, 2007 1:21 pm
Location: Netherlands

Re: Regular expressions and empty paragraph

Post by floris v »

Why do you even want tabs at the start of a paragraph? If it is for indenting the first line, there are better ways to do that. You can simply change the paragraph style, Indents and Spacing tab, and choose a behaviour for the First line.

Regular expressions are tricky, and the regular expressions in OOo are a lot worse than regular expressions in general. $ spots a paragraph break, but only in the search box. Why \n spots a line break in the search box and inserts a paragraph break in the replace with box is one of the great mysteries of this software.
OpenOffice 4.1.11 on Ubuntu; LibreOffice 6.4 on Linux Mint, LibreOffice 7.6.2.1 on Ubuntu
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Regular expressions and empty paragraph

Post by John_Ha »

Also see [Tutorial] How to record a macro (and Regular Expressions) for more on regular expressions.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Lupp
Volunteer
Posts: 3549
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Regular expressions and empty paragraph

Post by Lupp »

User "franklekens" seemingly assumes Writer searches its text as one chunk consisting of every textual content and spanning all the paragraphs like a sequence of "characters" forming a single string.

That's not the case. We may start with noticing that neither 'Tab' nor 'LineFeed' nor 'NewParagraph' are actually characters, but may need to be treated in a similar way under some aspects and in different ways under different aspects.

There are a few more cases where something having a code (ASCII or UniCode) isn't "just a character", and there may be needs to represent aspects of the layout using codes additionally occurring inside the string giving the plain text. In addition there are very relevant properties of a text document that are not represented as characters or codes at all - and some of them searchable nonetheless. Very few visitors to this forum will know the details.

Concerning the use of F&R with regular expressions (RegEx) it's essential to be aware of the fact that it is done in steps of one paragraph at a time. The only exception (actually an as-if-exception) is the search for "$" in the role of a paragraph break without any accompanying elements. Only the "$" reminds of RegEx here.

Searching for RegEx spanning parts of more than one paragraph is not supported.
franklekens wrote:...But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason.
The relevant one of these "some reasons" is that the replace string simply not is a RegEx - and cannot be. After all regular expressions are made for describing a (type of) syntax defining acceptable strings. ... (See https://en.wikipedia.org/wiki/Regular_language.) To use a special character in a similar way it is usable in a RegEx to give a mnemonic isn't inhibited by that, but tradition (may always be bad or less bad) has decided to use the $-sign in replace strings only in a completely different sense. If you want to insert a paragreph break by F&R you need to use \n. Yes. I also think that's silly. Yes. There is no way to date to insert a simple linebreak with F&R though we may tend to think \n should do it.

Some of the restrictions accepted by the way Aoo/LibO use their ICU RegEx engines are overcome by AltSerach. But nothing in this world and the more in the world of software is perfect. Users aren't an exception proving the rule.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Post Reply