[Solved] Scanned text document

Writing a book, Automating Document Production - Discuss your special needs here

[Solved] Scanned text document

Postby domrivard » Sun Jan 06, 2008 2:55 am

I would like to know how to be able to transfer a text document from my scanner to open office writer.
Last edited by Hagar Delest on Tue Jun 10, 2008 2:42 pm, edited 2 times in total.
Reason: tagged the thread as Solved.
domrivard
 
Posts: 1
Joined: Tue Dec 18, 2007 8:32 pm

Re: scanned text document

Postby TerryE » Sun Jan 06, 2008 3:06 am

{I corrected the typo in the title]

Your scanner s/w will be able to save scanned pages as GIFs or PNGs whci can be included in OOo documents as images, but I suspect that that's not what you are looking for. OOo does not include OCR functionality, though there are some pretty good freeware packages that you can pick up and most scanners include some basic OCR package with the scanner. Just "recognise" the document using the package and save it as RTF. OOo can open RTF files.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
TerryE
Volunteer
 
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: scanned text document

Postby Bhikkhu Pesala » Tue Jan 15, 2008 9:20 pm

Try Softi FreeOCR.

This is very easy to use, and a light download.
Idiot Compassion
LibreOffice 6.0.4 on Windows 10
User avatar
Bhikkhu Pesala
 
Posts: 1253
Joined: Mon Oct 08, 2007 1:27 am

Re: scanned text document

Postby huw » Wed Jan 16, 2008 11:48 am

I used http://simpleocr.com/ a while back and it did what I needed.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: scanned text document

Postby Dipa » Sun Jan 20, 2008 1:30 am

I used to have a Canon scanner which I could use to scan a document and then I could copy and paste the scanned pdf file into Open Office. The Canon scanner worked fine till one day it started freezing up when I tried to add the second page to the file. So, I bought a new Epson CX 7400 All in One.

Now the problem is I can't copy and paste the pdf file into Open Office writer in order to edit it.
I have read that what I need is an OCR software program. So, I have downloaded the simple OCR and found that it didn't recognize the text with any accuracy that would be useful. I would have to retype the OCR, so why bother using simple OCR?

I just read on hear about softi free ocr and I downloaded that. I never got to installing it because the technical difficulty of needing some program to open the download became long and involved and would I agree to allowing pop ups to use the free unpacker called ACE something or other.

There has got to be an easier way. Please tell me that there is a way that I can scan and then copy and paste into Open Office without going out and getting a degree in computer engineering first.

XP is the operating system I am using.

Simple language responses would be most appreciated.
thanks,
Dipa
Dipa
 
Posts: 11
Joined: Sun Jan 20, 2008 1:16 am

Re: scanned text document

Postby Hagar Delest » Sun Jan 20, 2008 1:40 am

Dipa wrote:I used to have a Canon scanner which I could use to scan a document and then I could copy and paste the scanned pdf file into Open Office.

The easiest way is to save the scanned picture in a graphical format like .png or .jpg depending on the content. PDF should not be used for pictures only, it's not its purpose and integration in OOo documents can be difficult (especially under Linux).
AOO 4.1.6 on Xubuntu 19.04 and 4.1.5 on Windows 7 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28558
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: scanned text document

Postby Dipa » Sun Jan 20, 2008 2:31 am

I think maybe I wasn't clear enough in my post. I am not scanning pictures. I have no interest in pictures.
I am scanning a document and my wish is to be able to save the file into open office so that I can edit the text. What seems to be the case with this Epson All in One is that it will not allow me to save the file as a text file. So, I am checking with this list to determine what options are available. The free ocr program couldn't read the text accurately. Is there anyone on this list who scans documents and then saves the file as text in open office and are able to edit the document? What sort of sofware did you have to buy? What type of scanner did you have to buy? I am willing to buy whatever I need. I just have been told that All in One scanners are now made to scan pictures with little to no thought about scanning documents which can then be copied and pasted into open office and then changed as needed.
Dipa
 
Posts: 11
Joined: Sun Jan 20, 2008 1:16 am

Re: scanned text document

Postby sybille » Sun Jan 20, 2008 6:09 am

I do what you describe in Linux using a program called gscan2pdf.
For character recognition, it uses tesseract-ocr. I find that tesseract-ocr works quite well, provided that I scan at 300 or 600 dpi.

The program Bhikkhu Pesala suggested, Softi FreeOCR, also uses tesseract-ocr, so I think it would be worth a try if you can get it installed. Maybe you could try downloading the installer again - it should be a regular exe file without any compression (like ACE).

Hagar's suggestion to use an image of the page would make sense if you didn't need to edit the text. I scan to image and paste into Writer when I want to keep formatting, and I use tesseract-ocr with gcan2pdf when I need to change the text.
If your problem has been solved, please edit this thread's initial post and add "[Solved]" to the subject line. Thanks!
-------
About Ubuntu Linux
Zotero, for research and bibliography management with OOo.
OOo 2.4.X on Ubuntu 8.x + None needed :)
sybille
Volunteer
 
Posts: 122
Joined: Sat Jan 05, 2008 12:21 pm
Location: France

Re: scanned text document

Postby Hagar Delest » Sun Jan 20, 2008 11:52 am

Dipa wrote:I used to have a Canon scanner which I could use to scan a document and then I could copy and paste the scanned pdf file into Open Office. The Canon scanner worked fine till one day it started freezing up when I tried to add the second page to the file. So, I bought a new Epson CX 7400 All in One.

Now the problem is I can't copy and paste the pdf file into Open Office writer in order to edit it.

You were not so clear. OK, you talked about using an OCR application. But it seems that you were used to insert such scanned files in OOo. So first issue seemed to be the insertion of PDF files in OOo. IMHO, I'm not sure it's the best way, even to reproduce a text - as a picture - in a Writer document.

So to keep your former method, use a graphical format to have the exact layout. Else, investigate the OCR process.
AOO 4.1.6 on Xubuntu 19.04 and 4.1.5 on Windows 7 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28558
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: scanned text document

Postby Dipa » Sun Jan 20, 2008 3:34 pm

To Sybill,
When I try to download sofitfreeocr the dialog box asks what to open it with. I have no idea what to open it with. I downloaded ACe something or other thinking it was what I needed to open softifree download with.
I got stuck at that point with requests to agree to allow pop up ads from ACe to sell me things. Do you have any ideas of what I can use to open the download with? I used Utorrent to open open office when I downloaded it.
Dipa
 
Posts: 11
Joined: Sun Jan 20, 2008 1:16 am

Re: scanned text document

Postby sybille » Sun Jan 20, 2008 4:05 pm

I don't use Windows at all, so I'm not sure how to help.

I did download the Softi FreeOCR installer from the site posted and it was an exe file. I then ran the file using wine on Linux and it put up a window saying that MS's .NET framework was not installed, which is the case. There were no messages about ACE compression.

Do you have the option to save the installer file to disk and then open it rather than opening it from your browser? If that doesn't help, maybe you could try downloading the file with a different browser. For example, if you're using Internet Explorer, you could try Firefox instead.

Or if you're feeling really adventurous, you could try out gscan2pdf with tesseract-ocr using a Linux LiveCD such as Ubuntu. It won't touch your Windows installation. A quick Google suggests that the Epson CX 7400 is supported for scanning and printing, although you might have to do some configuring for it. Maybe all of that is too much hassle, but I'm just mentioning it as an option since it would let you try out the OCR engine for free.
If your problem has been solved, please edit this thread's initial post and add "[Solved]" to the subject line. Thanks!
-------
About Ubuntu Linux
Zotero, for research and bibliography management with OOo.
OOo 2.4.X on Ubuntu 8.x + None needed :)
sybille
Volunteer
 
Posts: 122
Joined: Sat Jan 05, 2008 12:21 pm
Location: France

Re: scanned text document

Postby Dipa » Sun Jan 20, 2008 6:12 pm

I can't open the file. I can't save it to disk.
Dipa
 
Posts: 11
Joined: Sun Jan 20, 2008 1:16 am

Re: scanned text document

Postby sybille » Sun Jan 20, 2008 6:35 pm

I don't know, but this seems like a browser problem. What browser are you using?

Can you download the file from the following link (which I just copied from the Softi FreeOCR web site)?
http://www.softi.co.uk/focr22_setup.exe
Try right-clicking on the link, what choices do you have?

Have you tried using a different browser?
If your problem has been solved, please edit this thread's initial post and add "[Solved]" to the subject line. Thanks!
-------
About Ubuntu Linux
Zotero, for research and bibliography management with OOo.
OOo 2.4.X on Ubuntu 8.x + None needed :)
sybille
Volunteer
 
Posts: 122
Joined: Sat Jan 05, 2008 12:21 pm
Location: France

Re: scanned text document

Postby Dipa » Sun Jan 20, 2008 11:56 pm

Thank you Sybille,
The link that you sent me worked. I first had to download Net Framework 2.0 but after I did that I was able
to install the Free OCR. I already scanned a document and there were just a couple of errors. I saved the file and was able to make all the changes I wanted to in open office.

Thank you so much for helping me out. I really appreciate the time you took to help me.

Dipa
Dipa
 
Posts: 11
Joined: Sun Jan 20, 2008 1:16 am

Re: scanned text document

Postby Weatherlawyer » Sat Apr 05, 2008 7:12 pm

Dipa wrote:I am not scanning pictures. I am scanning a document and my wish is to be able to save the file into open office so that I can edit the text.

Is there anyone on this list who scans documents and then saves the file as text in open office and are able to edit the document? What sort of software did you have to buy? What type of scanner did you have to buy? I am willing to buy whatever I need.


I do this quite often.

I just open the PDF in Foxit reader rather than Adobe. If the PDF isn't DRMed as can happen with the latest Adobe crippleware, the Foxit programme can save text directly to any office suite.

Click the "A" icon in the toolbar (between the "hand" and "camera" icons) and select the area of text you want. You will be stymied at pictures and page ends. But once you get the hang of it you can convert a document in minutes.

I have only used the freeware version of Foxit. I gather the full version is even better. I would recommend it without ever having seen it let alone tried it. The free version is excellent. But getting pictured text to copy over is a problem. The stuff I am on about is something in the original file that was photocopied and thus the photocopy is a picture image in the PDF.
Foxit reader intro

With some PDFs you get a freebie with the first copy and with others the letters all run into one and sometimes contain erroneous characters too. Older PDFs seem free from this. I am no tech expert though, just speaking from my limited experience.
Weatherlawyer
 
Posts: 76
Joined: Thu Jan 24, 2008 12:18 am

Re: scanned text document

Postby Weatherlawyer » Sat Apr 05, 2008 7:35 pm

Dipa wrote: I first had to download Net Framework 2.0 but after I did that I was able to install the Free OCR.


I couldn't get the Microsftware to download in Firefox so I used my AOL browser. Thinking about it now I realise that I never had JavaScript enabled in Ffx.
Weatherlawyer
 
Posts: 76
Joined: Thu Jan 24, 2008 12:18 am

Re: [Solved] Scanned text document

Postby y_b_nrml » Sat Mar 26, 2011 12:07 am

I know this is marked solved and hasn't had activity in a while, but I just wanted to add something that I'm surprised has not been mentioned since a there seems to be a lot of difficulty with finding an OCR converter.

A lot of all in one printers include a version of PaperPort on the installation CD. It is also available for purchase as a separate program, if you wish to spend the money.

You need to configure it to send documents to OO.

I have version 10 on Windows XP so other versions may be slightly different, but you go to the Tools menu, then "New Program Link..." and follow the wizard that opens up. You have to restart PaperPort for an icon link to OO to show up in your Send To toolbar.

Once you do this, you scan your document as a document, not a photo, using PaperPort. A thumbnail image of the document will show up on your PaperPort desktop. Then, you can open the scanned document for editing in OO by dragging and dropping the thumbnail image to the OO icon in the toolbar. Very easy and works quite well for me.
OpenOffice 3.1 on Windows XP SP3
y_b_nrml
 
Posts: 3
Joined: Tue Dec 15, 2009 8:40 pm

Re: [Solved] Scanned text document

Postby amauriced » Thu May 12, 2011 1:28 am

For future reference on this thread, I tried Simple OCR, and it worked very well. I had to do some editing and paragraph spacing, but that was a long way from having to retype. I didn't even have to configure my scanner (HP Officejet G85 all-in-one). I also tried SoftscanWiz, which worked perfectly to scan to .pdf, but the only successful conversion from the .pdf file was with MyMorph, http://docmorph.nlm.nih.gov/docmorph/mymorph.htm, and I was only able to covert it to a text document sans any formatting. Otherwise, OO treated the document as an image. :D
Open Office 3.1.1 on Windows XP
amauriced
 
Posts: 2
Joined: Thu Oct 08, 2009 10:45 pm

Re: [Solved] Scanned text document

Postby Annalee » Sun Aug 25, 2019 4:30 pm

I have found a program called OCRFeeder (you can install it from Ubuntu Software Centre). It will take an an image or PDF file and do an OCR scan on it. You can then clean it up and export it to a .odt file.
Last edited by RusselB on Sun Aug 25, 2019 5:30 pm, edited 1 time in total.
Reason: Website address for non-forum related software removed.
OpenOffice 3.1 on Windows Vista
User avatar
Annalee
Banned
 
Posts: 1
Joined: Sun Aug 25, 2019 4:26 pm

Re: [Solved] Scanned text document

Postby RoryOF » Sun Aug 25, 2019 5:48 pm

I use gImageReader, which will allow input from a scanner or from PDF files, with adjustment of the areas to be scanned etc; this can use Tesseract, which gives a high success level. Both from Ubuntu software center.
Apache OpenOffice 4.1.7 on Xubuntu 18.04.3 (mostly 64 bit version) and very infrequently on Win2K/XP
User avatar
RoryOF
Moderator
 
Posts: 29560
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland


Return to Advanced Uses

Who is online

Users browsing this forum: No registered users and 0 guests