[Tutorial] How do I remove end_of_paragraph marks?

Discuss the word processor

[Tutorial] How do I remove end_of_paragraph marks?

Postby John_Ha » Mon Jan 13, 2020 4:56 pm

If you copy text from a PDF, and sometimes from other sources like the web, you will find it is composed of "single line paragraphs" and the text does not flow. Go View > Non printing characters ..., (View > Formatting marks in LO) to see them. They are pilcrows and appear as ¶ or end_of_paragraph markers. You can hide them again later if you wish.

You can remove the unnecessary end_of_paragraph marks by running a few Find and Replace searches using regular expressions. You will almost certainly have to clean up the text afterwards but the majority of work will be done.

Note: If you copy from an email you will often find that each line ends with a newline character which is a backwards facing arrow.

 Edit: The OOoFBTools add on is excellent and highly recommended.

The searches below could be improved. OOoFBTools does a better job. 

You need four searches:

1. Find all genuine end_of_paragraph marks (lines ending in full stop, question or exclamation marks) and protect them by changing them to QAZWSX
2. Replace all unnecessary end_of_paragraph marks by a space
3. Put back the protected genuine end_of_paragraph marks from search 1
4. Delete any spaces at the beginning of lines.

Go Edit > Find and Replace. Click more options. Tick Regular expressions. Run the following four searches where sp means a space character. I do not think there is a limit to how much text you can do at a time - the files have just over 7,000 words which was two chapters of Vanity Fair.

Code: Select all   Expand viewCollapse view
Search 1
Find   : (\.|\?|!)$
Replace: $0QAZWSX
Click replace all

Search 2
Find   : $
Replace: sp
Click replace all

Search 3
Find   : QAZWSX
Replace: \n
Click replace all

Search 4
Find   : ^sp
Replace: leave the field blank
Click replace all

You will now probably need to clean up the text as there will be some errors.

It is also helpful to add an extra end_of_paragraph marker (by pressing Enter) after any headings or lines you know should not be run together - see image. You can fine tune the searches to cope with quotation marks, colons, right brackets etc appearing at the end of lines.

Clipboard02.png
Text from Start of Vanity Fair.PDF copied into Writer
Note the unnecessary end_of_paragraph marks which need to be deleted

See Start of Vanity Fair.PDF and Start of Vanity Fair.odt created from it by adding end_of_paragraph markers after the headings and running the searches.
Attachments
Start of Vanity Fair.pdf
(89.21 KiB) Downloaded 41 times
Start of Vanity Fair.odt
Text from PDF after searches to remove end_of_paragraph marks
(45.2 KiB) Downloaded 37 times
Last edited by John_Ha on Fri Jun 05, 2020 7:57 pm, edited 2 times in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 7839
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Tutorial] How do I remove end_of_paragraph marks?

Postby Hagar Delest » Mon Jan 13, 2020 11:27 pm

AOO 4.1.7 on Xubuntu 20.04 and 4.1.5 on Windows 10 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 29006
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: [Tutorial] How do I remove end_of_paragraph marks?

Postby esperantisto » Tue Jan 14, 2020 4:21 pm

And this extension is even better help: OOoFBTools. Choose Join broken lines/paragraphs (for automatic operation on the entire text) or Process ends of lines/paragraphs (to manually process a selection). No need to reinvent the wheel :-)
AOO 4.2.0 / LibO 6.x/7.x / Win 7 / openSUSE Linux Leap 15.1 (64-bit)
esperantisto
Volunteer
 
Posts: 524
Joined: Mon Oct 08, 2007 1:31 am

Re: [Tutorial] How do I remove end_of_paragraph marks?

Postby John_Ha » Tue Jan 14, 2020 6:07 pm

esperantisto wrote:And this extension is even better help: OOoFBTools.

That's very nice though I don't think many would find it with a name of OOo FBTools and description of "The cross platform OpenOffice.org extension OOo FBTools used to convert to and processing eBooks in FictionBook2 format." I am pleased to have flushed it out as it looks very powerful.

I went OOoFBTools > Join broken lines of a paragraph..., with the settings as below. It produced a virtually identical result to the searches I used above.

I will include it in the final tutorial.
Attachments
Clipboard01.png
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
John_Ha
Volunteer
 
Posts: 7839
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK


Return to Writer

Who is online

Users browsing this forum: No registered users and 25 guests