Request for comments: Combine paragraph fragments

Discuss the word processor
Locked
User avatar
MrProgrammer
Moderator
Posts: 5432
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Request for comments: Combine paragraph fragments

Post by MrProgrammer »

I am planning on creating a tutorial based on the following material. I realize that there are other topics about combining lines from OCR'd text. This tutorial will only discuss Writer's Autocorrect method.

This tutorial discusses using the AutoCorrect feature of Writer to combine single line paragraph fragments into paragraphs. Sometimes the text in a Writer document was originally created with short lines which form partial sentences. This typically happens when the text comes from an Optical Character Recognition application, or when it's copied from a PDF or from a website. Here is an example of text which needs to be combined. Reading the text we can see that there are two sentences. View → Nonprinting characters shows that the lines end in a ¶. That symbol shows the end of a Writer paragraph. But these eight lines are not genuine paragraphs, just paragraph fragments.
Text before conversion
Text before conversion
202502041020A.png (73.78 KiB) Viewed 6755 times

If there is documentation about combining these with Autocorrect, I can't find it, so I've prepared this guide. Writer seems to understand that actual paragraphs end with a punctuation symbol: period, question mark, exclamation park, perhaps others. So lines which don't end in punctuation are joined together by Autocorrect. This feature is designed for the paragraph structure of many languages; detailed editing for specific languages or special cases may be necessary and is your responsibility.

To join short paragraph fragments use these steps:
  • Set option [M] in Format → Autocorrect → Autocorrect Options → Options → Combine single line paragraphs. Use the Edit button to change the length setting. I suggest starting with if length greater than 0%.
  • Ensure all other AutoCorrect options are disabled. In Options, uncheck every box in the [M] column. This prevents any AutoCorrect changes except single line paragraph conversions. Also, in Localized Options, uncheck every box in the [M] column. Click OK. You may want to note which boxes were checked so you can restore those settings after single line paragraph conversion is complete.
  • Using the Format → Styles and Formatting dialog, set the style of the paragraphs (leftmost icon) which you want to convert to Default. Paragraphs which are set to a different style will not be converted. If you have opened a document which wasn't created by OpenOffice, say a plain text file or a simple document from an OCR application, all of your paragraphs will probably already have the Default style. You are more likely to have success with this feature if you specify the correct language for your text. Styles and Formatting → Paragraph Styles (leftmost icon) → Default → Right-click → Modify → Font → Western text font → Language.
  • Select the lines which contain the paragraph fragments you want to convert, or select nothing to process every paragraph with the Default style. You can use Edit → Find & Replace to help locate specific text in your document if you don't want to perform single line paragraph conversion on everything. The entire paragraph will be converted if any text in it is selected and it has the Default paragraph style. When you select any text, paragraphs without selected text are not converted. By converting only selected lines of text you can, for example, use different language settings for portions of your document.
  • Use Format → AutoCorrect → Apply.
The Apply action will combine the single line paragraph fragments. If you don't like the result, immediately use Edit → Undo. You can try various values for the if length greater than setting. The conversion will change the style of the converted paragraphs from Default to Text Body. You can set a different paragraph style now if you like. Here an example of the conversion for the text above:
Text after conversion
Text after conversion
202502041020B.png (70.82 KiB) Viewed 6755 times

This tutorial only covers the Autocorrect method for combining single line paragraphs. A search of the Writer forum for the terms OCR, remove paragraph breaks, or combine lines will locate topics discussing ways to do the conversion with Find & Replace, with extensions, or with macros. They could offer more control of the process at the expense of being more difficult to use.
Last edited by MrProgrammer on Thu Feb 27, 2025 1:34 am, edited 1 time in total.
Reason: Tutorial created; Request-for-comments topic locked
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
JeJe
Volunteer
Posts: 3132
Joined: Wed Mar 09, 2016 2:40 pm

Re: Request for comments: Combine paragraph fragments

Post by JeJe »

You mess up your Autocorrect options, or have to untick and tick them all back, and possibly more importantly, your document's paragraph styles/direct formatting. And it joins paragraphs ending in ! or ?

In your example text better defining factors for joining paragraphs might be paragraphs that end in a space or paragraphs that begin with a lower case letter.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Lupp
Volunteer
Posts: 3756
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Request for comments: Combine paragraph fragments

Post by Lupp »

I don't use AutoCorrect much, and I am therefore not sufficiently familiar with the tool to comment on the RFC as expected by the author. In specific I don't know how to save current settings, to disable them for the described proceeding, and to reestablish the previous settings latter,
However, I am rather familiar with the usage of RegEx, and made an example showing and explaining how to perform the task under discussion with their help.
You find the example attached.
ForMrProgrammer_AOO_4_1_7.odt
(82.3 KiB) Downloaded 70 times
On Windows 10: LibreOffice 25.8.4 and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 3132
Joined: Wed Mar 09, 2016 2:40 pm

Re: Request for comments: Combine paragraph fragments

Post by JeJe »

Using paragraph ends in a space:

(and/or we could test for next para begins with a lower case letter)

Code: Select all

sd = thiscomponent.createSearchDescriptor
sd.searchstring = "$"
sd.SearchRegularExpression = true

ret =thiscomponent.findall(sd)
if ret.count >0 then
for i = ret.count -1 to 0 step -1
tc =  ret(i).text.createtextcursorbyrange(ret(i).start)
tc.goleft(1,true)
if tc.string = " " then ret(i).string = ""
next
end if

Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Hagar Delest
Moderator
Posts: 33633
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Request for comments: Combine paragraph fragments

Post by Hagar Delest »

There was that macro that may be still just fine: Convert ASCII text files by deleting extra paragraph breaks.
LibreOffice 25.2 on Linux Mint Debian Edition (LMDE 7 Gigi) and 25.2 portable on Windows 11.
User avatar
Lupp
Volunteer
Posts: 3756
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Request for comments: Combine paragraph fragments

Post by Lupp »

I assumed that @MrProgrammer definitely was interested in solutions NOT depending on macros.
His request for comments was partially answered by @JeJe: "You mess up your Autocorrect options, or have to untick and tick them all back ... ", and I would have said about the same if not JeJe already had posted this. Shifting to a macro based solution, however, would also "mess up", more precisely: bloat, the user profile.
Therefore I missed the "RFC" on the one hand, but, on the other hand, offered an alternative without macros and without the need to change settings.

One more suggestion: Users who often work with OCR-ed text should probably prepare a second user profile and create a link to the shared soffice.exe containing a respective "-env:UserInstallation=..." clause instead of tampering with their single profile. This would also be a suggestion if a macro solution is chosen.

I hope @MrProgrammer will get more responses actually hitting the "RFC".
On Windows 10: LibreOffice 25.8.4 and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 3132
Joined: Wed Mar 09, 2016 2:40 pm

Re: Request for comments: Combine paragraph fragments

Post by JeJe »

MrProgrammer wrote: Wed Feb 05, 2025 11:32 pm with extensions, or with macros. They could offer more control of the process at the expense of being more difficult to use.
This is just the kind of problem extensions and macros were invented to solve... to avoid the tediousness of going into Autocorrect settings and unticking several boxes and so on.

Hagar Delest's macro is complicated and possibly difficult to use - but mine just involves running it.

MrProgrammer is more than capable of writing his own code. But for other users again, IMO, an extension with options to join paragraphs depending on choices like 'paragraphs begin with a lower case letter' would be the only sensible solution... select text and press the appropriate button to do it...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Locked