[Solved] Text from a .pdf doesn't fill the width of the page

Discuss the word processor
Locked
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

[Solved] Text from a .pdf doesn't fill the width of the page

Post by videobruce »

Below I C&P a portion of the text from a .pdf that was formatted to a narrow 'booklet' type format that doesn't fill the width of a regular 8.5 x 11" page..
The first two lines that do fill the width, I deleted the paragraph marks to show the comparison, but this pdf C&P'ed is 8 pages long.

Viewing non-printing characters only shows paragraphs breaks. There is no "paste special available when I do the 'paste'. The example is the attachment
Can the person I am suing sue me? Yes! If the person you are suing (the defendant) wants to sue you, s/he may file a small claims counterclaim against you. In Small Claims Court, a counterclaim can only be for the amount
that can be sued for in the court. The defendant will have to pay a
$3-5 filing fee plus the cost of mailing to file a counterclaim.
How will I know if the defendant files a counterclaim?
The Court will send you a notice or you will be told on the trial
date. If the defendant files a counterclaim, s/he must do so:
• Within 5 days of getting the notice of your claim, or
• On the day of the trial.
If the defendant sues me, will my case be postponed?
Maybe, it depends on when the defendant filed the counterclaim.
If the defendant filed it more than 5 days after getting the notice of
claim, but before the trial date, the judge must grant your request
to postpone the trial.
If the defendant files the counterclaim on the day of the trial, you
may ask the judge to postpone the case so you can have time to
prepare. But, the judge can say no.
Attachments
_pdf example width not filled.odt
(13.43 KiB) Downloaded 133 times
Last edited by videobruce on Mon Jan 13, 2020 7:51 am, edited 2 times in total.
OpenOffice v4.13
Win 7 & XP Pro
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Copying text from a .pdf doesn't fill the width of the p

Post by John_Ha »

1. View > Non printing characters
2. Delete the extra end of paragraph marks.

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

As stated I did (for the example) delete paragraphs, but I'm not about to do that a thousand times in the entire document on every line on every eight pages.
OpenOffice v4.13
Win 7 & XP Pro
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

Re: Copying text from a .pdf doesn't fill the width of the p

Post by musikai »

go to search and replace, in the advanced options further down check "regular expressions".
in the search field type:

Code: Select all

$
in the replace field type nothing or empty space or what you like and click "replace all"

viewtopic.php?f=5&t=56451
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Copying text from a .pdf doesn't fill the width of the p

Post by RoryOF »

Note that OpenOffice does not like paragraphs longer than 64K characters. You should be OK with 8 pages - normally 64K characters is about 15 pages.

If the character count shows up as more than 64K, you should select an area containing less than that, do the Find and Replace as above specified, and choose "Current Selection only" in addition to Regular Expressions on the More Options dropdown. Then select another area and repeat until your entire document is converted.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

When I first did a search for this, I include the term "word wrap" which I thought was the problem.
What causes this in the 1sty place? Why aren't some other characters present to represent the odd width restriction (or whatever it is called).

From that .pdf booklet style I did a 'copy as rich text' and it copied better, still with some issues to page width.

Here is a link to that booklet I'm talking about;
http://www.nycourts.gov/courthelp/pdfs/ ... ndbook.pdf
Note that OpenOffice does not like paragraphs longer than 64K characters.
A single paragraph, or the actual full document?? This wouldn't be just one paragraph.
Last edited by videobruce on Sat Jan 11, 2020 11:32 pm, edited 2 times in total.
OpenOffice v4.13
Win 7 & XP Pro
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

musikai wrote:go to search and replace, in the advanced options further down check "regular expressions".
in the search field type:

Code: Select all

$
in the replace field type nothing or empty space or what you like and click "replace all"
No results.

BTW, it's 'Find & Replace' and 'More options' ;)
OpenOffice v4.13
Win 7 & XP Pro
User avatar
Hagar Delest
Moderator
Posts: 32653
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Copying text from a .pdf doesn't fill the width of the p

Post by Hagar Delest »

Try that macro: Convert ASCII text files by deleting extra paragraph breaks.

Please add [Solved] at the beginning of the title in your first post (top of the topic) with the *EDIT button if your issue has been fixed.
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

That Code was suppose to open what is in that previous link I quoted?? I hit the expand all, but nothing was there. That process is more complicated than manually deleting the paragraphs.
OpenOffice v4.13
Win 7 & XP Pro
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Copying text from a .pdf doesn't fill the width of the p

Post by RoryOF »

Your sample file in your first post worked perfectly for me when I tried the Find and Replace as set out above.

Just be aware that gigantic paragraphs, as can occur with globally replacing paragraph ends with spaces can cause problems in OO.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Copying text from a .pdf doesn't fill the width of the p

Post by John_Ha »

videobruce wrote:As stated I did (for the example) delete paragraphs, but I'm not about to do that a thousand times in the entire document on every line on every eight pages.
If you want the text to flow you are going to have to do it.
videobruce wrote:When I first did a search for this, I include the term "word wrap" which I thought was the problem.
What causes this in the 1sty place?
It is the definition of how a PDF file works. If you don't like it, that's tough because there is nothing you can do about it. You can like it or lump it - your choice.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

I needed the text, but in a standard width format that doesn't eat up unnecessary pages.
Thanks for your help.
OpenOffice v4.13
Win 7 & XP Pro
steven8
Posts: 7
Joined: Sun Nov 26, 2017 8:48 am

Re: Copying text from a .pdf doesn't fill the width of the p

Post by steven8 »

Open the file in Adobe reader and save as text. You won't get all the PDF formatting like you do with copy/paste. You will have to do some work with it in Open Office, but you won't have all those returns at the end of every line.
OpenOffice 4.1.4 on Windows 7
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

Save the selected text as 'text' or save the entire pdf file as text?? I use PDF-XChange Editor and the only 2 other 'save' options are PP and something called a "PDF/A document".
If you mean saving specific pages or paragraphs, the only 'Copy' option is 'Copy as 'Rich text' which I sated I tried, but it doesn't always get rid of all the formatting on every page. But it's better than just Copy.
I never liked pdf files no matter what 'Viewer' or 'Editor' I've used. :x
OpenOffice v4.13
Win 7 & XP Pro
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

Re: Copying text from a .pdf doesn't fill the width of the p

Post by musikai »

videobruce wrote:
musikai wrote:go to search and replace, in the advanced options further down check "regular expressions".
in the search field type:

Code: Select all

$
in the replace field type nothing or empty space or what you like and click "replace all"
No results.

BTW, it's 'Find & Replace' and 'More options' ;)
What do you mean with "no results"?

It works correctly on your sample file and also on the whole text when imported from your mentionend PDF.
Here is the whole text without formatting:
SmallClaimsHandbook.odt
(41.96 KiB) Downloaded 122 times
But you can also use an online converter like https://www.ilovepdf.com/pdf_to_word
and get a much better result with formatting and images.
Here's the result. I just removed most images (so it will be less than the fileupload limit of 127kb here) and saved as odt.
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

Re: Copying text from a .pdf doesn't fill the width of the p

Post by musikai »

videobruce wrote: I use PDF-XChange Editor and the only 2 other 'save' options are PP and something called a "PDF/A document".
In my free PDF-XChange Editor under "save" there is a txt-option and even a docx-option.
But the result of the docx is bad and cluttered with "demo"-graphic-objects.
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

musikai wrote: What do you mean with "no results"?
Because I didn't understand what this procedure was and the drop down was blank at this end. There was no 'macro' code shown.
musikai wrote:In my free PDF-XChange Editor under "save" there is a txt-option and even a docx-option.
But the result of the docx is bad and cluttered with "demo"-graphic-objects.
What version do you have? I have v6.

Thanks for the 2 conversions, but that on-line version is nothing but a odt version of that 'narrow' format booklet.
Anyway, this is what I was looking for (just the needed portions of that pdf) see attachemnt.
Attachments
Small Claims Court edited.odt
(39.7 KiB) Downloaded 129 times
OpenOffice v4.13
Win 7 & XP Pro
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Copying text from a .pdf doesn't fill the width of the p

Post by RoryOF »

If you regularly need to do such conversions, i.e. from .pdf files to full width .odt files, you would be well advised to master the conversion technique set out earlier in this threat, which, I emphasise, has worked both for myself and musikai on your sample file. If it didn't work for you, you have omitted some step - try it again.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: Copying text from a .pdf doesn't fill the width of the p

Post by videobruce »

No, that was most unusual to say the least, I never had something with that many '1/3rd width' pages that I needed to save & print out.
Again thanks for everyone's help.
OpenOffice v4.13
Win 7 & XP Pro
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Copying text from a .pdf doesn't fill the width of the p

Post by John_Ha »

PDF is a format designed to preserve exactly what the document looks like when viewed on any device. A PDF is not designed to be edited.

As a direct consequence each individual line in a PDF is implemented as a single paragraph - essentially there is no concept of "a paragraph of more than one line" in a PDF file.

Hence when you copy from a PDF, or when you export the text from a PDF, you get short, single line paragraphs and each ends in an end-of-paragraph marker.

See [Tutorial] How do I view or edit a PDF file with OpenOffice?

Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Copying text from a .pdf doesn't fill the width of the p

Post by RoryOF »

Whether you need to do it regularly or not, I still think the technique worth mastering. It is often necessary to do this to plain text books downloaded from Gutenberg and other sites.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
videobruce
Posts: 63
Joined: Fri Feb 16, 2018 6:18 pm
Location: New York State, the Empire State

Re: [Solved] text from a .pdf doesn't fill the width of the

Post by videobruce »

As a direct consequence each individual line in a PDF is implemented as a single paragraph - essentially there is no concept of "a paragraph of more than one line" in a PDF file.
Interesting, I guess that kinda explains everything. :super:
OpenOffice v4.13
Win 7 & XP Pro
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

Re: [Solved] text from a .pdf doesn't fill the width of the

Post by musikai »

videobruce wrote:
As a direct consequence each individual line in a PDF is implemented as a single paragraph - essentially there is no concept of "a paragraph of more than one line" in a PDF file.
Interesting, I guess that kinda explains everything. :super:
Yes, but there are conversion tools that are more "intelligent" and are capable of solving this. See the result of the online conversion tool mentioned here
viewtopic.php?f=7&t=100719&p=485119#p485049
that produces multiline paragraphs and preserve formatted headings etc.
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: [Solved] Text from a .pdf doesn't fill the width of the

Post by John_Ha »

LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Locked