How to un-ragged right align text

Discuss the word processor
Post Reply
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

How to un-ragged right align text

Post by taylorkh2 »

Word processing is not my thing so please pardon me if this is a rather basic question. I think I reached my word processing peak with Wordstar on CP/M :lol: I have a legal document which I am attempting to put into Writer format so that I can maintain it in the future. I scanned the document and converted it to text with the tresseract OCR tool. The results were FANTASTIC. Near perfect conversion; I was amazed. I have gone through the main 25 page body underlining, bolding, indenting etc.

The original document was formatted such that the lines of text were stretched from margin to margin - sort of like a column in a newspaper. This is the opposite of what I recall being referred to as "ragged right." The OCR processed text is all left justified with a ragged right margin. I am trying to format it like the original document - mainly as an aid in proof reading. Can someone point me to the formatting setting I need to adjust? I have not found such a thing but I suspect it is there somewhere.

TIA,

Ken

p.s. Another blast from the past... When the company where I worked MANY years ago switched from Word Perfect to MS Word the professional typists raised a howl as they lost the "reveal codes" feature in Word Perfect. Not be in that field I always considered reveal codes to be sort of a cumbersome pain. Over the years I did come to appreciate that capability - especially when I had to examine a file in a hex editor to debug formatting issues such as why a document which appeared to be one page printed out over 5 pages in random chunks etc.

Is there any sort of tool in Libre Office which would allow me to examine my OCR text files and see what is in them such as line feed, carriage returns, padding spaces etc.?
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

/View /Nonprinting characters will give you the closes OO or LibO has to viewing field codes. As you are working on an OCRed document, also turn on /View /Text boundaries (may be elsewhere for LibO). If your text shows up in groups of words in surrounding boxes, your OCR has returned an attempt to maintain the formatting of the original document and used frames to group the words. If the words are not in boxed groupsthen right click on a paragraph and choose Alignment : justify from the popup. However, the last line of each paragraph will notn (by default) justify.

If each line is terminated by a backwards P character ( a Pilcrow) after you have turned on /View /nonprinting characters, then it is effectively le last line of a paragrah and anything I advise will destroy the reproduction of the original formatting.

Each of the above switches is a toggle; using them again will turn off the altered display.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: How to un-ragged right align text

Post by John_Ha »

See below.
Icons for View > Non printing characters ..., and Right Justify
Icons for View > Non printing characters ..., and Right Justify
As a new poster you will find much useful information in the Writer FAQ, the Writer Tutorials, the up to date Writer guide and the Writer Manual. May I suggest you bookmark the pages.

Press F1 to access the Help screen and search for your problem

The chapter headings in the manual are:

1 - Introducing Writer
2 - Setting up Writer
3 - Working with Text
4 - Formatting Pages
5 - Printing, Exporting, Faxing and E-Mailing
6 - Introduction to Styles
7 - Working with Styles
8 - Working with Graphics
9 - Working with Tables
10 - Working with Templates
11 - Using Mail Merge
12 - Tables of Contents, Indexes and Bibliographies
13 - Working with Master Documents
14 - Working with Fields
15 - Using Forms in Writer
16 - Customizing Writer – Keyboard shortcuts.

When a pop-up window opens, click the Help button for extensive help on that function - it is often more comprehensive than the manual.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

Thanks John_Ha,

I have looked through the help - searched for "ragged" in help and on-line figuring that if I found out how to switch "ragged right" on, I could probably figure out how to switch it off. But no luck. Probably a throwback to Wordstar... I worry about CONTENT first and apply such formatting as I want after. Modern word processors seem to work the opposite way. First select the template of how you want the final document to look and then put some text into it.

And back to help... RoryOF recommended View; Nonprinting characters. In the version of LO which I have that is View; Formatting marks. I abandoned MS Office many years ago after discovering the by simply sending an MS O document as an ATTACHMENT to an Outlook email caused Outlook to write meta data about the email INTO THE ATTACHMENT HEADER. I ripped MS O from my home PC and replaced it with Star Office. I have used Star Office, OpenOffice and now LibreOffice. Unfortunately things such as View; Nonprinting... change all to frequently. I use LO Calc mostly. I found recently that the menu steps to split a spreadsheet had moved and were not where they are listed in help. Oh well...

Thanks RoryOF,

As mentioned above the actual menu selection on my version of LO are a little different. However, once I switched this feature on I observed that the end of each line in the OCRed file (after saving in .odt format) is shown as a paragraph delimiter. If I replace all Pilcrow-Pilcrow with ~ then replace all Pilcrow with a space and then all ~ with a Pilcrow I should convert all my multi-line chunks of text into paragraphs. That will give me a nicer looking document I think.

Back to the alignment question... I started a new document from scratch and typed in a paragraph of text. If I highlight the block of text and right click I do not see an option to Align. I do see Paragraph and when I select that I am taken to a dialog with 9 tabs. Tab 2 is Alignment. I tried the various options and justified seems to be the magic word. That did the trick. Now let me put these newly learned skills to use and see what sort of mess I can make :mrgreen:

Thank you both for your prompt responses and advice.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

I would normally use %%%% rather than a - for double pilcrow replacement. My advice is based on OpenOffice, which my Forum .sig will show is my usual suite. LibreOffice versions move many memu items, which is one reaso I don't use it. The other is that OO does what I want with great stabilty; I'm a great believer in the old adage of "don't change horses mid-stream".
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: How to un-ragged right align text

Post by jrkrideau »

Am I missing something completely? Would not the first step be for Ken (the OP) to set his Style or Styles to Alignment -> Justified before anything else?
LibreOffice 7.3.7. 2; Ubuntu 22.04
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

After an OCR, a document is often single line paragraphs.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

Update...

I have poked around in my document attempting to find and replace the paragraph separators. So far, no luck. %%%% does not find anything. Help; Regular expressions is not very helpful in this matter.

I opened the Writer format file in a hex editor. I cannot find ANYTHING resembling the text in the original document in the text panel of the editor. Very strange. Is LO encrypting the contents of the document???

As that was getting me nowhere I went back to the original OCR text file. I found the hex character 0A used as the "paragraph" delimiter at the end of each scanned line. I replaced 0A0A with ~ then any remaining 0A with nothing then ~ with 0A0A (corrected from my prior post). I then loaded the file into LO writer and saved as an .odt file. I chanced to the desired Liberation Serif 10.5 font, selected one paragraph and went through the steps to "justify" the paragraph and to my delight ALL paragraphs were now stretched across the width of the page.

Unfortunately I would have to do all the indenting, bolding, underlining etc. over again. So... I really need to figure out how to identify a paragraph delimiter in an odt format file.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

Use Find and Replace. Find ^$ Replace %%%%, drop More Options and click on Regular Expressions. Press Replace All button.

Then Find $ Replace <space char> (press space bar instead of <space char>), Replace All button (Regular expressions is, I assume, still selected).

Then Find %%%% Replace \n Replace All button (Regular Expressions selected)

These are the settings for OpenOffice. I think they are unchanged for LibreOffice.

The ODT format file is a zip archive. Open it with Winzip/7-zip or other archiver. The text content of the file is in content.xml, which, with great care, you can extract, edit, reinsert and then close the archive.
 Edit: The paragraph delimiter is the Pilcrow (backwards P); In the advice I give above the %%%% is inserted as a marker in place of an empty line (a solitary Pilcrow), which will often be the marker of a preceeding series of lines comprising a paragraph in the original document. The OCR process often reads and marks every line as a paragraph, and one;s best hope of maintaining the correct paragraph structure is the empty line OCR may give separating the intended paragraphs. That is why in my initial posting I did not give the more compex instrctions fgor turning the assemblies of individual lines back into their paragraph structure, as it might have produced a file of one big paragraph. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

Thanks again RoryOF,

I searched around and found this LO extension https://extensions.libreoffice.org/exte ... for-writer and installed it in LO in a virtual machine. The extension refers to the paragraph delimiter as regex \p. I applied some various find and replace magic to my document and it is looking good (except for centering which I seem to have lost). That will not take long to fix. I had used TAB to do the indents for numbering rather than try and use the built in format tools. I think I will leave well enough alone in that area as section R. (3) (a) appears as it did in the original printed document.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

Many of the reformatting options needed on an OCRed document depend very much on the format of the original document and on the options chosen in or by default of the OCR engine. My OCR reformatting is normally of full books, and I don't care about preserving their formatting, so I set for the OCR to produce a plain text file. Then I reformat to my liking.

The extension you point to above is normally referred to as AltSearch. Very powerful, but tricky to drive; on booklength replacements it can be extremely slow (such as overnight!).
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

My experience with OCR processing is rather limited. I recall about 30 years ago at my place of work the document management folks wanted to scan and OCR some documents for archival purposes. Being in the IT field I was tasked with helping them setup their scanner etc. I do recall that the scanner was attached to a 486 PC with a SCSI card and cable. It was FAST compared to my home scanner which was connected with (I think) a parallel printer port cable. And my PC was a Pentium II !

They got to work scanning and OCRing their documents. I happened to look at WHAT they were scanning. It turned out that most of the documents were government regulatory documents which could be downloaded from on-line. It took some doing but I finally convinced them that if the downloaded document was of the same revision it was just as good as their scanned paper document. The good old days :D

Thanks again for your assistance.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

I had occasion recently to scan about 150 images for a presentation (not all to be used, but a selection made), My normal scanners are on all-in-one printers, standing on plinths. I did not relish standing beside such a scanner to position the book, then dashing back to the computer to mark the scan area and start the scan 150 times.

I rooted in my box for rooting in, found a pci SCSI card and cable, took my many years disused (15-18 years?) old Epson GT7000 scanner down from its shelf, plugged all together and was able to do the entire scanning job seated comfortably at my desk. Linux found the card and the scanner; the Xsane software was able to drive it. SCSI scanners are fast!
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

I though the LO document might be xml. I recall that MS Office has used that format for quite a while. I tried opening my project document with an xml editor... The editor program barfed - something to the effect "conversion from encoding failed." Oh well.

My scanner is in fact an ancient Brother MFC 240c scanner, copier, color ink jet printer and fax machine. I purchased it a long time ago for almost nothing. I needed to replace another ink jet printer which had died. Brother's Linux support is quite good. I have run the thing on Ubuntu 8.04, Ubuntu 9.10, CentOS 6 and now CentOS 7. I did find an issue with my new Dell Precision Workstation. The print functions worked fine but it would not scan. I spent a couple of weeks working with a fellow from New Zealand on the linuxquestions.org forums. Long story short - the thing would not work on a USB 3 controller - even if plugged into a USB 2 port. My solution was to purchase a "High Speed" USB 2 card on evilbay for about $4 US and install it in the workstation. Scans great now.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
robleyd
Moderator
Posts: 5055
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: How to un-ragged right align text

Post by robleyd »

LO and AOO documents are a zipped structure of XML files; as a former IT person you might have noticed that the files start with PK when viewed in a hex editor.

You can uncompress the files from the command line with unzip.
Cheers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.1.2; SlackBuild for 24.2.1 by Eric Hameleers
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

Thanks David,

I had not looked further with the hex editor than to try and find a "paragraph" which I could identify and find the end marker. Zipped xml - that is a neat idea. I recall making an MS Word document containing the text "two words" and the darned thing would be HUGE.

I love your Tux with a mini-gun. Fantastic. I recall a video made my Mike Dillon the late founder of Dillon Precision in Arizona USA many years ago. The company makes ammunition reloading equipment and also does some aviation work. They leased something like 20 square miles (a bunch of hectares!) of desert and invited folks with legally owned machine guns to bring them out for a shoot. Needing something challenging to shot AT the constructed several radio controlled airplanes from Styrofoam. The planes were delta wing and about 4 feet (a little more than a meter) long.

The planes were launched and flew back and forth in front of the firing line about 300 yards (meters) away. Probably 50 + machine guns of all sorts opened fire. And the planes flew back and forth seemingly impervious to the gunfire. In actuality most of the area of the planes was foam which could penetrated without damage. Only a small area represented by the engine and controls was vulnerable. As the line of hand held and tripod mounted machine guns could not knock down the planes it was time to escalate.

A quad 50 opened fire. This was a World War II anti aircraft weapon with four .50 Caliber (12.5mm) Browning machine guns mounted on a turret. It pours out 1,800 rounds per minute of BIG bullets. It finally knocked down one plane. Bring on the modern technology - a mini-gun. Actually the mini-gun traces its history back to the hand cranked Gatling gun which was patented in 1862. I believe that Michael Dillon owned one of only two mini-guns in private collections. The mini-gun cut loose and hosed the last two planes out of the sky in short order.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to un-ragged right align text

Post by RoryOF »

If an OCRed text has no blank lines between paragraphs, removng the end of line paragraph markers destroys all the paragraph markers of the file and turms it into one big paragraph. In a small text of two or three pages, this paragraph information can be edited back by hand, referring to the original document, but that becomes impractical for book-length texts.

How such information might be preserved for a major text depends very much on the exact form of the OCR result. Sometimes, before the paragraph removal process, it is possible to find that each original text paragraph starts with an indent if the OCR preserves this. Then one can search for this indent, insert a blank line (empty paragraph) before it, and proceed to the paragraph removal using the %%%% (or other unique marker) substitution I outlined earlier.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

The text file resulting from OCR did in fact have empty lines between paragraphs. Each "empty" line consisted of a paragraph delimiter. Unfortunately indents were not carried over to the text file. Perhaps a setting in the OCR software would do this. However, the accuracy of the recognition was so good that I did not go back and try any additional OCR runs. The most prevalent error was mistaking a period for a comma. Perhaps 6 or 8 in 25 pages. Not bad.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: How to un-ragged right align text

Post by jrkrideau »

You should be able to set indents in the paragraph Style (Styles) you are using, under the Indents and Spacing tab.
LibreOffice 7.3.7. 2; Ubuntu 22.04
taylorkh2
Posts: 9
Joined: Wed Feb 27, 2019 3:48 pm
Location: North Carolina, USA

Re: How to un-ragged right align text

Post by taylorkh2 »

Thanks jrkrideau,

For the purposes of this document I simply used tabs to indent the start of each paragraph or sub-paragraph. Some indents are several levels deep

Code: Select all

     IV
         A)
               1)
                    a)
I am not sure how consistent the original document was so I am reluctant to globally apply indent, numbering and bullet formatting.

The document I am working on is a revocable trust. In all likelihood it will never be republished. That would establish a new trust with a new trust date and require all items in the trust such as Real Estate, investment accounts etc. to be transferred to the new trust. Minor changes will be handled with amendments I think. When I asked the original attorney for an electronic copy - many years ago - his office sent me a collection of rather poor scan to pdf files. As I had a scanner and OCR software available in Linux I decided to see what I could do.

Ken
LibreOffice 5.3.6.1 on CentOS 7.6
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: How to un-ragged right align text

Post by jrkrideau »

Ah, I see what you mean now.

This is no something I would do but I believe that you can use the list and numbering lists to do this. Perhaps someone who knows what they are doing could intervene here?

It should reduce the work considerably once set up.
 Edit: Here is a first crack at the idea. 
Attachments
indents.odt
(20.49 KiB) Downloaded 108 times
LibreOffice 7.3.7. 2; Ubuntu 22.04
Post Reply