Creating and mass populating Custom Dictionary

Discuss the word processor

Creating and mass populating Custom Dictionary

Postby Paddy » Mon Aug 16, 2010 1:06 am

I want to create a botanical dictionary and mass populate it. I have read Bruce Byfield's article but the description he gives of what the dictionary file looks like does not seem to apply to OOo3.2 Writer. I have tried creating a custom dictionary in Writer but obviously one can only add one word at a time by this method. I also tried creating the dictionary in Writer then editing it using a text editor but having done so and returning to Writer it no longer sees this file. What am I doing wrong? Or is there no way to do what I require?
Openoffice 3.2 on Windows XP
Paddy
 
Posts: 1
Joined: Sun Aug 15, 2010 9:57 pm

Re: Creating and mass populating Custom Dictionary

Postby franx » Mon Aug 16, 2010 6:17 pm

If you want to create an extension dictionary (.oxt) for OOo:

I've attached a short example dict-en_US_private.oxt [renamed to .zip],
made from the word list in the attached 1_word_collection.odt

(1)
Collect the words in Writer (see sample 1_word_collection.odt) or directly in a text editor [UTF-8].
(One word, one "pharagraph", no spaces.)
Sort the words alphanumeric (and copy them to a text editor [UTF-8]).
Insert the number of words in the first line (see en_US_private.dic) and save as *.dic.

(2)
I've created a folder dict-en_US_private for all the necessary files and added en_US_private.dic.
Then add an *.aff file.
I've copied en_US.aff from the English extension dictionaries and renamed it to en_US_private.aff.

(3)
Create dictionaries.xcu and customize the sample to your file names (see unzipped dict-en_US_private.oxt).
Code: Select all   Expand viewCollapse view
<?xml version="1.0" encoding="UTF-8"?>
<oor:component-data xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" oor:name="Linguistic" oor:package="org.openoffice.Office">   
<node oor:name="ServiceManager">
    <node oor:name="Dictionaries">
       <node oor:name="HunSpellDic_en-US_private" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/en_US_private.aff %origin%/en_US_private.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_SPELL</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>
    </node>
</node>
</oor:component-data>


Create description.xml and customize the sample to your file names and date.
Code: Select all   Expand viewCollapse view
<?xml version="1.0" encoding="UTF-8"?>
<description xmlns="http://openoffice.org/extensions/description/2006" xmlns:d="http://openoffice.org/extensions/description/2006"  xmlns:xlink="http://www.w3.org/1999/xlink">
    <version value="2010.08.16" />
    <identifier value="en_US_private" />
    <display-name>
        <name lang="en">en_US_private spelling dictionary</name>
    </display-name>
    <platform value="all" />
    <dependencies>
        <OpenOffice.org-minimal-version value="3.0" d:name="OpenOffice.org 3.0" />
    </dependencies>
</description>


Copy sub folder META-INF with manifest.xml.
Code: Select all   Expand viewCollapse view
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
<manifest:manifest xmlns:manifest="http://openoffice.org/2001/manifest">
    <manifest:file-entry manifest:media-type="application/vnd.sun.star.configuration-data"
        manifest:full-path="dictionaries.xcu"/>
</manifest:manifest>


Add license.txt and README (see sample)

(4)
In a final step, I've created an archive file dict-en_US_private.zip with:
en_US_private.dic
en_US_private.aff
dictionaries.xcu
description.xml
license.txt
README_extension_owner.txt
manifest.xml [in Folder META-INF]

Then renamed to dict-en_US_private.oxt.
Installed and tested with OOo 3.2.1 / OOo 3.3 beta (on WinXP).

Play with the extension (but backup your user profile) ... ;)
See also → Extension Dictionaries
<http://wiki.services.openoffice.org/wiki/Extension_Dictionaries>
Attachments
dict-en_US_private.zip
(22.66 KiB) Downloaded 408 times
1_word_collection.odt
(16.03 KiB) Downloaded 325 times
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
 
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Creating and mass populating Custom Dictionary

Postby franx » Mon Aug 16, 2010 7:15 pm

... alternative options with the user-defined dictionaries →

(1) OOo Dictionary Importer/Exporter →
Re: What is the Encoding of User-Defined Dictionaries?

(2) OOo 3.3 –
From → Feature Freeze Testing 3.3 – Component : Word Processing (Writer)
[...]
106032 : Change of file format for new created user-dictionaries
Description : The file-format of dictionaries created by the user now defaults to a flat UTF-8 text file. Thus the content can easily be viewed in regular editors. However be careful when editing it, see issue 106032 for details.
Feature Announcement : http://sw.openoffice.org/servlets/ReadM ... &msgNo=324
[...]

Sample: OOo-dev 3.3 (user-defined dictionary, created/opened/edited with a text editor)
Code: Select all   Expand viewCollapse view
OOoUserDict1
lang: <none>
type: positive
---
Eĥoŝanĝo
Příliš
bávččas
coṇcoṇ
daño
gør
jagħmilli
kwik
rănește
szkło
tägelîch
tükörfúrógép
yishą́ągo
ægithales
čuovžža
ē-tàng
можу
мшистым
Ὀδυσσέα
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
 
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Creating and mass populating Custom Dictionary

Postby lmselby » Fri Apr 19, 2013 4:53 am

Thank you very much, franx. I was trying to bring in my entries in Haitian Creole to a custom dictionary that I created in Writer and when I would "save as" the .dic file in UTF-8 and then try to edit it in Writer, all my accented vowels became weird symbols.

I do not know how to program, but followed all the instructions on your post Mon Aug 16, 2010 11:17 am. Once I created an archive folder (.zip) and renamed it with the suffix .oxt, I followed the instructions within LibreOffice in Tools>Extension Manager>Add and I was able to select the oxt folder and install the dictionary. I did not make any changes in the name of the OpenOffice versions used. Please advise me if I have to make changes according to the LibreOffice 4.0 version installed on my computer or reconfigure the extension according to new standards. Thanks in advance.
LibreOffice 4.0 on Windows 7 Home Premium SP1 64 bit
lmselby
 
Posts: 4
Joined: Wed Apr 17, 2013 6:21 am

Re: Creating and mass populating Custom Dictionary

Postby stereo » Thu Feb 27, 2014 4:44 pm

I have text files which contain many proper names and special words which are not in the dictionnary. I would like to export them into an extra text file in a list form in order to work on them, without selecting them one by one. How can I do that?
OpenOffice 4.0.0 on Windows 8
stereo
 
Posts: 2
Joined: Thu Feb 27, 2014 4:33 pm

Re: Creating and mass populating Custom Dictionary

Postby Hagar Delest » Fri Feb 28, 2014 9:17 am

It depends on how the file looks like.
Can you upload an excerpt of it?
Is it really the same problem as above?
AOO 4.1.6 on Xubuntu 19.04 and 4.1.5 on Windows 7 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28543
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Creating and mass populating Custom Dictionary

Postby Hagar Delest » Fri Feb 28, 2014 10:42 pm

stereo wrote:Thank you very much for your reply and for looking at my problem and question. Please find attached an excerpt file. Its one page in German language with dialect words and proper names. I have many pages (400p) and many more to come. I' ve found out that it is possible to create within OO your own dictionary, but that demands checking each word step by step. Furthermore, each local dialect differs from another by spelling. So I am interested in glossaries and dictionaries of different dialects, too. A list can be compared to other word lists. I think that an export of words in a text unknown to a standard dictionary, when offered within OO, in any language, could be useful for people who work on language problems (translators, dialectologists, terminologists, historians and toponymists searching for proper names, people learning of a foreign language or those who teach them etc.). It looks to me like the problem above, maybe mine is more general. I saved my excerpt file in doc 97/2000/XP.

I quote your PM so that the discussion benefits other users (I don't reply such question by mail or PM anyway).

If your words are in italics (that's what I guess from your attachment), you can select them all at once then paste them in a text file. But you'll need to add a delimiting character first. So you need to search for " -" (space and dash) then Find all and set the selection to italics.
Then search for format italics (in the More options panel). Then Find all, paste them in a text file and use find and replace again to replace the space and dash by a paragraph break (\n with regular expressions set ON). You should get your words list.
AOO 4.1.6 on Xubuntu 19.04 and 4.1.5 on Windows 7 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28543
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Creating and mass populating Custom Dictionary

Postby stereo » Sat Mar 01, 2014 12:00 am

Thanks very much Hagar for your help, I will try the proposed solution immediately. In this case/text file the use of italics is the exception (maybe some words in questions are not in italics), but for the amount of text pages and the time being, it is just fine. I would appreciate if you could remove the attachement, because the text is already printed in a book, I want to avoid copyright issues. I will try to upload another test text file later, which I have to produce first. Thanks for your understanding and help.

I am just trying. What a genius idea! I did not know about this option in 'find and replace'. I just have problems with the 'space dash' in italics, respectively setting the delimiter. OO crashed twice! One problem is due to thé irregular forms of space and dashes, but there are also others, I try to find a solution.

As promised, I upload a demo text, which is copyright free and which contains words in italics as words unknown to the dictionary. The extraction of words in italics works, I just struggle with the delimiter problem. The copied and pasted words in italics clutter in the next document together. I tried different approaches, e.g. usiung § as a delimiter, but the fact that author(typer) used bold, italics and different dashes makes it difficult. There are many further obstacles which I did not identified yet. Anyhow, as I mentioned before its a solution for words in italics or likewise format, it does not work if the text is in one format. It tried with attributes, but I did not find a solution with this. Anyhow, thank you so far.
Attachments
The Tragedy of Hamlet, Prince of Denmark_p1.doc
demo text file
(20.5 KiB) Downloaded 192 times
OpenOffice 4.0.0 on Windows 8
stereo
 
Posts: 2
Joined: Thu Feb 27, 2014 4:33 pm


Return to Writer

Who is online

Users browsing this forum: No registered users and 20 guests