Entering Japanese Text Using OCR into a PDF

Talk about anything at all....
Post Reply
User avatar
White Phoenix
Posts: 257
Joined: Tue Jan 01, 2008 7:10 am

Entering Japanese Text Using OCR into a PDF

Post by White Phoenix »

I have a Wacom tablet that I have been using on Google Translate to handwrite Japanese entries from subtitles and programs to create text files with the original Japanese phrase then the romanization and translation. I do this by copying the text from the windows into text and OpenOffice files. I found out that by using OCR I can enter handwritten Japanese characters into a PDF file and then transfer that into a plain text file. Since OpenOffice can create PDF files, could I do the same thing with OpenOffice? If so, then there isn’t any need to get a PDF editor, I would only need the OCR tool.
Apache OpenOffice 4.1.11 on Windows 7 Professional. 4.1.11 on Linux Mint 18.3 with Cinnamon.
User avatar
Zizi64
Volunteer
Posts: 11362
Joined: Wed May 26, 2010 7:55 am
Location: Budapest, Hungary

Re: Entering Japanese Text Using OCR into a PDF

Post by Zizi64 »

If you want type-in japanese characters easily, then you must have some localized settings of the LibreOffice, a full featured Unicode Font what contains the japanese "characters"; and a japanese keyboard.

Or you can adjust and use the Autocorrect feature of the LibreOffice (and you must have a full featured Unicode Font what contains the japanese "characters" too).
Just add the katakana or hiragana characters to the Autocurrect feature with they name and some identifier character "code" to distingwish the normal an japanese character series.
The LibreOffice can substitute the character codes inside a word. The Apache OpenOffice can substitute whole "words" only (characters between two spaces). But you have upgraded to LibreOffice, haven not? (Please modify your signature in this Forum.)

For example:
:_ka: --> カ
:_ta: --> タ
Tibor Kovacs, Hungary; LO7.5.8 /Win7-10 x64Prof.
PortableApps/winPenPack: LO3.3.0-7.6.2;AOO4.1.14
Please, edit the initial post in the topic: add the word [Solved] at the beginning of the subject line - if your problem has been solved.
Mountaineer
Posts: 318
Joined: Sun Sep 06, 2020 8:27 am

Re: Entering Japanese Text Using OCR into a PDF

Post by Mountaineer »

White Phoenix wrote:... I found out that by using OCR I can enter handwritten Japanese characters into a PDF file and then transfer that into a plain text file. Since OpenOffice can create PDF files, could I do the same thing with OpenOffice? If so, then there isn’t any need to get a PDF editor, I would only need the OCR tool.
???

Your ocr creates computer-readable codes from your "wacom-image". This may be written in a pdf, but usualy ocr can also write to Word-Files .doc. Maybe they can also write to .odt - check yourself.

When you wrote to .doc/.odt you can open directly in Writer. If you have written to .pdf you need a .pdf-viewer, a pdf-editor is not necessary (only if you want to change the pdf). As pdf-viewing is now possible in most/all browsers you will already have this software. Don't open zhe pdf with Open-Office (possible, but not the best idea).

J.
OpenOffice 3.1 on Windows Vista
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Entering Japanese Text Using OCR into a PDF

Post by esperantisto »

Zizi64 wrote:…and a japanese keyboard.
Well, a conventional QWERTY keyboard is fine, just enable a Japanese input method.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
User avatar
White Phoenix
Posts: 257
Joined: Tue Jan 01, 2008 7:10 am

Re: Entering Japanese Text Using OCR into a PDF

Post by White Phoenix »

esperantisto wrote:
Zizi64 wrote:…and a japanese keyboard.
Well, a conventional QWERTY keyboard is fine, just enable a Japanese input method.
Exactly, but I still find handwriting quicker than typing Japanese. Besides I need the practice writing kanji.

The idea is to have what I draw into an actual font type not just an image of my handwriting.

Actually, I have setup WinCompose to enter Japanese kana into a text file, but kanji are a lot more complex. Although I do pretty good now drawing kanji so Google Translate can understand them. I only have problems when they don’t match because of the fonts. So it’s still easier for me to handwrite kana and kanji than to use a keyboard. As for fonts I have fonts for hentaigana as well as the regular katakana and hiragana. Sokou Mincho is my current favorite Japanese font.

No, I only installed LibreOffice to convert the old Works files. The only other files I had that were a problem, were MIDI files and they have been taken care of. Animated GIF files are the only old formats I have left to convert to animated PNG files. I still have no real need to replace OpenOffice with LibreOffice. Now if someone were to develop a replacement for Works, I would buy that, if it was Windows 7 or Linux compatible.

That being stated, I would need to create the PDF files first with the Japanese text, which is why I thought I would need an editor. I suppose I could make screenshots of the source material for the OCR to convert to text, but that only works if I have the original Japanese in an image or program. I sometimes am copying Japanese from books, so that is why I am looking for a way to use the Wacom tablet as the input device for Japanese characters. In any event, if I can use OpenOffice to write the kana and kanji into the document or text file, then that would actually ideal. I will try that, but I still need an OCR tool.
Apache OpenOffice 4.1.11 on Windows 7 Professional. 4.1.11 on Linux Mint 18.3 with Cinnamon.
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Entering Japanese Text Using OCR into a PDF

Post by RoryOF »

If you are using linux, gimagereader or vietOCR are good front ends to Tesseract for OCRing text in Roman type from scans or PDF files. I find Tesseract very accurate for good scans or PDFs. I do not know if it can handle Japanese script.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
White Phoenix
Posts: 257
Joined: Tue Jan 01, 2008 7:10 am

Re: Entering Japanese Text Using OCR into a PDF

Post by White Phoenix »

Well the Linux side isn’t quite setup to my liking, but it is functional. It seems there was something I used on Linux to write kana at least, but I don’t think it could copy-paste into a text file. I will need to check it out though. Any ways Linux might be the best bet to find what I need and most everything available is right there in the Software Manager. Oh I remember, I was waiting until all of my drives were re-organized before finishing installing all my programs on Linux.
Apache OpenOffice 4.1.11 on Windows 7 Professional. 4.1.11 on Linux Mint 18.3 with Cinnamon.
Post Reply