[Tutorial] How do I remove end_of_paragraph marks?

Discuss the word processor
Post Reply
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

[Tutorial] How do I remove end_of_paragraph marks?

Post by John_Ha »

If you copy text from a PDF, and sometimes from other sources like the web, you will find it is composed of "single line paragraphs" and the text does not flow. Go View > Non printing characters ..., (View > Formatting marks in LO) to see them. They are pilcrows and appear as ¶ or end_of_paragraph markers. You can hide them again later if you wish.

You can remove the unnecessary end_of_paragraph marks by running a few Find and Replace searches using regular expressions. You will almost certainly have to clean up the text afterwards but the majority of work will be done.

Note: If you copy from an email you will often find that each line ends with a newline character which is a backwards facing arrow.
 Edit: The OOoFBTools add on is excellent and highly recommended.

The searches below could be improved. OOoFBTools does a better job. 
You need four searches:

1. Find all genuine end_of_paragraph marks (lines ending in full stop, question or exclamation marks) and protect them by changing them to QAZWSX
2. Replace all unnecessary end_of_paragraph marks by a space
3. Put back the protected genuine end_of_paragraph marks from search 1
4. Delete any spaces at the beginning of lines.

Go Edit > Find and Replace. Click more options. Tick Regular expressions. Run the following four searches where sp means a space character. I do not think there is a limit to how much text you can do at a time - the files have just over 7,000 words which was two chapters of Vanity Fair.

Code: Select all

Search 1
Find   : (\.|\?|!)$
Replace: $0QAZWSX
Click replace all

Search 2
Find   : $
Replace: sp
Click replace all

Search 3
Find   : QAZWSX
Replace: \n 
Click replace all

Search 4
Find   : ^sp 
Replace: leave the field blank
Click replace all
You will now probably need to clean up the text as there will be some errors.

It is also helpful to add an extra end_of_paragraph marker (by pressing Enter) after any headings or lines you know should not be run together - see image. You can fine tune the searches to cope with quotation marks, colons, right brackets etc appearing at the end of lines.
Text from Start of Vanity Fair.PDF copied into Writer<br />Note the unnecessary end_of_paragraph marks which need to be deleted
Text from Start of Vanity Fair.PDF copied into Writer
Note the unnecessary end_of_paragraph marks which need to be deleted
See Start of Vanity Fair.PDF where every line is a paragraph and the paragraph markers need to be removed. See Start of Vanity Fair.odt which was created from the PDF by adding end_of_paragraph markers after the headings and running the searches.
Attachments
Start of Vanity Fair.pdf
(89.21 KiB) Downloaded 302 times
Start of Vanity Fair.odt
Text from PDF after searches to remove end_of_paragraph marks
(45.2 KiB) Downloaded 251 times
Last edited by John_Ha on Sat Nov 14, 2020 6:47 pm, edited 3 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Hagar Delest
Moderator
Posts: 32665
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: [Tutorial] How do I remove end_of_paragraph marks?

Post by Hagar Delest »

LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: [Tutorial] How do I remove end_of_paragraph marks?

Post by esperantisto »

And this extension is even better help: OOoFBTools. Choose Join broken lines/paragraphs (for automatic operation on the entire text) or Process ends of lines/paragraphs (to manually process a selection). No need to reinvent the wheel :-)
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Tutorial] How do I remove end_of_paragraph marks?

Post by John_Ha »

esperantisto wrote:And this extension is even better help: OOoFBTools.
That's very nice though I don't think many would find it with a name of OOo FBTools and description of "The cross platform OpenOffice.org extension OOo FBTools used to convert to and processing eBooks in FictionBook2 format." I am pleased to have flushed it out as it looks very powerful.

I went OOoFBTools > Join broken lines of a paragraph..., with the settings as below. It produced a virtually identical result to the searches I used above.

I will include it in the final tutorial.
Attachments
Clipboard01.png
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply