Regular expressions and empty paragraph
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Regular expressions and empty paragraph
I'm trying to find my way in the somewhat peculiar regular expressions of Writer's find dialogue. I gather I can find an empty paragraph by searching for $. But is it totally impossible to search for a comination of an empty paragraph with something else?
Specifically, I'd like to search for empty paragraphs followed by a tab (\t). These are in my text, but "find" refuses to find them.
(Currently *all* my paragraphs start with a tab, and I'd like to remove only the tabs following on an empty paragraphs, so that only paragraphs from the second onward are indented, and never the first paragraph after a blank line. And yes, I know indenting paragraphs by means of a tab sign is fairly primitive, it's better to use paragraph styling for that, but for reasons beyond my control I can't make use of that option right now. Some clients prefer it this way.)
Many thanks for any useful tip.
Specifically, I'd like to search for empty paragraphs followed by a tab (\t). These are in my text, but "find" refuses to find them.
(Currently *all* my paragraphs start with a tab, and I'd like to remove only the tabs following on an empty paragraphs, so that only paragraphs from the second onward are indented, and never the first paragraph after a blank line. And yes, I know indenting paragraphs by means of a tab sign is fairly primitive, it's better to use paragraph styling for that, but for reasons beyond my control I can't make use of that option right now. Some clients prefer it this way.)
Many thanks for any useful tip.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Re: regular expressions and empty paragraph
Look at the extension AltSearch (Alternative dialog Find & Replace for Writer); I don't know if it will solve your problem, but it offers far more facilities than OOo's Find and Replace.
http://extensions.services.openoffice.o ... /AltSearch
http://extensions.services.openoffice.o ... /AltSearch
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Regular expressions and empty paragraph
That's not correct. $ will find the end of a paragraph but not necessarily of a paragraph without any text. For that use ^$. OOo Find doesn't search beyond paragraph endings, use AltSearch for that, as RoryOF already pointed out.I gather I can find an empty paragraph by searching for $.
OpenOffice 4.1.11 on Ubuntu; LibreOffice 6.4 on Linux Mint, LibreOffice 7.6.2.1 on Ubuntu
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
Re: Regular expressions and empty paragraph
Search for otherwise empty ones = ^\t$Currently *all* my paragraphs start with a tab
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
Thanks for the tip, an interesting extension.
But Alternative Search too doesn't seem to find any ^$\t when clearly there are many such instances.
But Alternative Search too doesn't seem to find any ^$\t when clearly there are many such instances.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Re: Regular expressions and empty paragraph
Are you sure these are empty _Paragraphs_? If you Turn on / View / nonprinting characters, they should be marked with backwards P symbol. If they have a left pointing hooked arrow they need a different search term.
/View / Nonprinting Characters or the equivalent backwards P button on toolbar are toggles.
/View / Nonprinting Characters or the equivalent backwards P button on toolbar are toggles.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
Yes, I know about SHIFT+ENTER. They're really hard returns, i.e. empty paragraphs. It doesn't seem to work, and if it should I can't see what I'm doing wrong.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Re: Regular expressions and empty paragraph
You said *all* your paragraphs started with a tab. Does that include that include the "empty" ones?
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
Good one, I hadn't thought of that.
But when I look: no, the empty paragraphs are really empty. Every paragraph containing text starts with a TAB.
By now it's not really a problem that has to be solved for this text. I have to go through the text manually to edit it anyway, so I'll remove the superfluous tabs as I go along.
It just bugs me that none of the search options allows me to search for ^$\t. It's not such a wild combination, is it?
Can other people make it work?
But when I look: no, the empty paragraphs are really empty. Every paragraph containing text starts with a TAB.
By now it's not really a problem that has to be solved for this text. I have to go through the text manually to edit it anyway, so I'll remove the superfluous tabs as I go along.
It just bugs me that none of the search options allows me to search for ^$\t. It's not such a wild combination, is it?
Can other people make it work?
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Re: Regular expressions and empty paragraph
If there are no or few other tabs in the file, search for \t to remove the tabs and ^$ for the empty paragraphs. If there are a few other tabs, you might have to replace these beforehand with a marker (I usually use %%%%) and subsequently replace the marker with a tab using Find and Replace.
Try this:
Search for ^\t Replace with nothing.
Search for ^$ Replace with nothing.
Search for ^\t Replace with nothing.
Search for ^$ Replace with nothing
Edit: Won't work correctly: OKay, getting it now: Search for ^$ Replace with %%%% Search for %%%%\t Replace with nothing. In case there were any empty paras not followed by tabs, now Search for %%%% Replace with nothing. That should do the job! |
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
Intermediate step with a dummy replacement -- of course. Done that before, simply didn't think of it now. Silly.
Thanks for the tip.
Still annoying that search won't find this. It's illogical, captain.
Thanks for the tip.
Still annoying that search won't find this. It's illogical, captain.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
Scrap that. It doesn't work but something else does.
To recapitulate: I wanted to keep all tabs in front of paragraphs, except for the tabs that occurred immediately after an empty paragraph (blank line).
Replacing empty paragraphs by %%%%, and then getting rid of %%%%\t works. But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason. They ended up as ^$ signs in the text.
But I see Alternative Search has another feature: you don't have to use ^$ for empty paragraphs, you can just search for hard returns. \p is a hard return, \p\p is an empty paragraph (obviously).
So simply replacing \p\p\t by \p\p seems to do the trick.
I haven't manually checked my entire text for undesired results, but I think it works.
To recapitulate: I wanted to keep all tabs in front of paragraphs, except for the tabs that occurred immediately after an empty paragraph (blank line).
Replacing empty paragraphs by %%%%, and then getting rid of %%%%\t works. But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason. They ended up as ^$ signs in the text.
But I see Alternative Search has another feature: you don't have to use ^$ for empty paragraphs, you can just search for hard returns. \p is a hard return, \p\p is an empty paragraph (obviously).
So simply replacing \p\p\t by \p\p seems to do the trick.
I haven't manually checked my entire text for undesired results, but I think it works.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
- franklekens
- Posts: 133
- Joined: Thu Oct 30, 2008 11:05 am
- Location: Amsterdam
Re: Regular expressions and empty paragraph
No, it does have undesired effects. Weird things happen, I'm going to quit this nonsense.
--
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Frank Lekens
OOo 4.4.0 on Ms Windows 8.1
(compatibility: MS Office 2013)
Re: Regular expressions and empty paragraph
You can replace %%%% by \n instead, as in the replace field \n gives a paragraph mark.franklekens wrote:But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Regular expressions and empty paragraph
Why do you even want tabs at the start of a paragraph? If it is for indenting the first line, there are better ways to do that. You can simply change the paragraph style, Indents and Spacing tab, and choose a behaviour for the First line.
Regular expressions are tricky, and the regular expressions in OOo are a lot worse than regular expressions in general. $ spots a paragraph break, but only in the search box. Why \n spots a line break in the search box and inserts a paragraph break in the replace with box is one of the great mysteries of this software.
Regular expressions are tricky, and the regular expressions in OOo are a lot worse than regular expressions in general. $ spots a paragraph break, but only in the search box. Why \n spots a line break in the search box and inserts a paragraph break in the replace with box is one of the great mysteries of this software.
OpenOffice 4.1.11 on Ubuntu; LibreOffice 6.4 on Linux Mint, LibreOffice 7.6.2.1 on Ubuntu
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
Re: Regular expressions and empty paragraph
Also see [Tutorial] How to record a macro (and Regular Expressions) for more on regular expressions.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: Regular expressions and empty paragraph
User "franklekens" seemingly assumes Writer searches its text as one chunk consisting of every textual content and spanning all the paragraphs like a sequence of "characters" forming a single string.
That's not the case. We may start with noticing that neither 'Tab' nor 'LineFeed' nor 'NewParagraph' are actually characters, but may need to be treated in a similar way under some aspects and in different ways under different aspects.
There are a few more cases where something having a code (ASCII or UniCode) isn't "just a character", and there may be needs to represent aspects of the layout using codes additionally occurring inside the string giving the plain text. In addition there are very relevant properties of a text document that are not represented as characters or codes at all - and some of them searchable nonetheless. Very few visitors to this forum will know the details.
Concerning the use of F&R with regular expressions (RegEx) it's essential to be aware of the fact that it is done in steps of one paragraph at a time. The only exception (actually an as-if-exception) is the search for "$" in the role of a paragraph break without any accompanying elements. Only the "$" reminds of RegEx here.
Searching for RegEx spanning parts of more than one paragraph is not supported.
Some of the restrictions accepted by the way Aoo/LibO use their ICU RegEx engines are overcome by AltSerach. But nothing in this world and the more in the world of software is perfect. Users aren't an exception proving the rule.
That's not the case. We may start with noticing that neither 'Tab' nor 'LineFeed' nor 'NewParagraph' are actually characters, but may need to be treated in a similar way under some aspects and in different ways under different aspects.
There are a few more cases where something having a code (ASCII or UniCode) isn't "just a character", and there may be needs to represent aspects of the layout using codes additionally occurring inside the string giving the plain text. In addition there are very relevant properties of a text document that are not represented as characters or codes at all - and some of them searchable nonetheless. Very few visitors to this forum will know the details.
Concerning the use of F&R with regular expressions (RegEx) it's essential to be aware of the fact that it is done in steps of one paragraph at a time. The only exception (actually an as-if-exception) is the search for "$" in the role of a paragraph break without any accompanying elements. Only the "$" reminds of RegEx here.
Searching for RegEx spanning parts of more than one paragraph is not supported.
The relevant one of these "some reasons" is that the replace string simply not is a RegEx - and cannot be. After all regular expressions are made for describing a (type of) syntax defining acceptable strings. ... (See https://en.wikipedia.org/wiki/Regular_language.) To use a special character in a similar way it is usable in a RegEx to give a mnemonic isn't inhibited by that, but tradition (may always be bad or less bad) has decided to use the $-sign in replace strings only in a completely different sense. If you want to insert a paragreph break by F&R you need to use \n. Yes. I also think that's silly. Yes. There is no way to date to insert a simple linebreak with F&R though we may tend to think \n should do it.franklekens wrote:...But I couldn't then replace %%%% by ^$, because that's not interpreted as a regular expression in the replace field, for some reason.
Some of the restrictions accepted by the way Aoo/LibO use their ICU RegEx engines are overcome by AltSerach. But nothing in this world and the more in the world of software is perfect. Users aren't an exception proving the rule.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
---
Lupp from München