How to OCR-scan to PDF?

Discuss the drawing application
Post Reply
helices
Posts: 3
Joined: Fri Aug 14, 2015 3:38 pm

How to OCR-scan to PDF?

Post by helices »

I'm weaning off of M$ Office & Adobe Acrobat

OS: Windows 7 64 bit

Can I scan directly from OpenOffice ? If so, please, point me to docs.

Can OpenOffice directly OCR scanned media? If so, please, point me to docs.

Can OpenOffice directly create PDF files? If so, please, point me to docs.

What I want to do is OCR-scan documents directly to PDF. Can OpenOffice do this?

Please, advise. Thank you.
Last edited by helices on Mon Aug 17, 2015 8:27 pm, edited 1 time in total.
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to OCR-scan to PDF?

Post by Villeroy »

Putting a picture into a PDF file does not require any office suite at all.
You scan, open the picture and print it to PDF. There are many programs out there which generate PDF from anything you can send to a printer from any application.

http://www.bullzip.com/products/pdf/info.php

If OpenOffice or LibreOffice can trigger your scanning program or not depends on the driver program of your scanner. Under Mac/Linux this is no problem. Anyway, generating PDF from pictures has nothing to do with any particular office suite.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to OCR-scan to PDF?

Post by RoryOF »

Frequently software supplied with scanners provides for direct scanning to PDF. Worth re-examining the software disk supplied with the scanner.
 Edit: OCR programs are highly specialised. If one does not come with the scanner, it has to be acquired separately. On Windows I have used OmniPage and ReadIris; both of these produce .RTF/.DOC output which can be read and edited by Open-/Libre-Office. On Xubuntu I use Tesseract, fed from OCRFeeder.
Open-/Libre-Office does not OCR graphic or scanner input. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
helices
Posts: 3
Joined: Fri Aug 14, 2015 3:38 pm

Re: How to OCR-scan to PDF?

Post by helices »

First off, I intended for "OCR" to obviously indicate text documents; rather than images

Whether or not the 10 year old scanner came with software to "scan to PDF" does not answer my question, does it?

OCR scanning directly to PDF is currently done via CS4 Adobe Acrobat - I need to find another way

Er, I just noticed that I left "LibreOffice" in my original post - I meant this to be "OpenOffice"

I'm new to both OpenOffice and LibreOffice; so, I'm asking this question to both

What I want to do is OCR-scan documents directly to PDF. Can LibreOffice do this?
~ helices
OpenOffice 4.x on Windows 7
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to OCR-scan to PDF?

Post by RoryOF »

If your scanner works with TWAIN drivers it should be able to scan pictures/graphics into Open-/Libre-Office - this usually works OK, but there have been reported problems. Neither Open-/Libre-Office can OCR directly - you must use an OCR utility for this, then (if desired) feed the text output into Open-/Libre-Office for layout and correction.

If you are content that your text page be output as an illustration in a PDF file, then you can scan direct into Open-/Libre-Office, /File /Export as PDF and your file will be readable as a PDF, but remember that it will always be a picture, not text.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
keme
Volunteer
Posts: 3704
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: How to OCR-scan to PDF?

Post by keme »

Mostly your questions have been answered, but perhaps in a roundabout manner. I'll try to rephrase what has already been stated, and fill in a few blanks ...

When you have TWAIN compatibility (which you have with most scanners), you can scan from OpenOffice/LibreOffice. Menu sequence in Writer: Insert - Picture - Scan ...
This will be a picture, not text.

There are several advantages to have the picture content converted to text. Alas, OpenOffice applications do not do OCR out of the box.

Scanner drivers or OCR software may come with an OCR plugin for various software. If they made such a plugin for OpenOffice, it may be possible to seamlessly import text from scanned image in Writer, IOW the plugin would provide OCR capability to OpenOffice.

Other scanners come with scanning software which can be set to "push" OCR output to applications such as Writer. Almost as elegant and efficient as a plugin, and probably easier to implement.

The "nitty gritty" procedure is to scan to image, then OCR from image to text, then import the text into your wordprocessor.
Proofreading is required for most OCR situations, so even if it were available, I wouldn't recommend using automated scan-OCR-PDF. AFAIK the single step solution is not available with the apps in question here.
Apache OO 4.1.12 and LibreOffice 7.5, mostly on Ms Windows 10
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to OCR-scan to PDF?

Post by RoryOF »

It came to me that the OP may be misunderstanding OCR. If you already know what OCR is, read no further.

OCR stands for Optical Character Recognition. A scanner is effectively a special type of camera; it takes a photograph of what is on its glass. Whether what is on the glass is a picture or a page of text, the output is still a photograph. The computer can only handle this as a photograph, so if you put it into a text editor, it is put in as a photograph. When you have a photograph any text that is on it cannot easily be changed (one has to use special editors such as Photoshop). If one wishes to alter the text on the photograph pf a page of text (the scan), perhaps to lay it out in a different fashion, one must pass it through an OCR program, which "reads" the text and puts it into a file as if it had been typed in; this file can then be passed to the text editor - in this case, Writer - as if it was input from the keyboard.

OCR accuracy depends very much on the quality of the original. OCR (in general) reads only type (not handwriting, whether joined up or block letters). The better the original from which you are working, the better the accuracy. Good quality print on clean paper works best. Uneven old typewriting from fabric ribbon days on flimsy paper, which may be crumpled and discoloured, is not good. Even worse are carbon copies on flimsy from the foregoing. After OCRing it is essential to proofread the resulting text before reprinting from it. Good OCR is about 98% accurate - that is two characters wrong in 100, or two spelling mistakes in about 15 words. Where figures are involved, such as in budgets, balance sheets etc, double/treble checking is essential or the errors could be expensive!
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to OCR-scan to PDF?

Post by Villeroy »

The topic is "How to OCR-scan to PDF?" PDF is not an editable format. If the target is PDF, you can leave out the whole OCR.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: How to OCR-scan to PDF?

Post by RoryOF »

Well, yes and no, Villeroy. If the scan is direct to PDF it is merely a series of images, but ideally a text in a PDF should be searchable, which would require that the PDF be built from text, not merely the images of the original sheets.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
helices
Posts: 3
Joined: Fri Aug 14, 2015 3:38 pm

Re: How to OCR-scan to PDF?

Post by helices »

Yes, RoryOF, that is what I'm doing now with Adobe Acrobat CS4 & have been doing for much of the last 10 years

In fact, the OCR accuracy is so good, there are very rarely issues

I do this with all business, medical & legal documents received as paper

I know that MS Office products do not play in my status quo; but, I'm looking to wean myself completely from MS Office & ideally from Adobe as well

If either OpenOffice or LibreOffice could do this, that would be my primary selection criterion between the 2
~ helices
OpenOffice 4.x on Windows 7
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to OCR-scan to PDF?

Post by Villeroy »

On Linux and Mac systems it is comparatively easy to write your own little program which starts your preferred scanning software, passes the resulting picture to your preferred OCR software and the resulting text file to your preferred text editor.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Post Reply