Hyphenation file format

Writing a book, Automating Document Production - Discuss your special needs here
Post Reply
maurotrev
Posts: 5
Joined: Fri Aug 25, 2017 9:13 am

Hyphenation file format

Post by maurotrev »

Hi all, I am writing an hyphenation file for my language, I found a description of the Liang algorithm and I implemented the rules. Now I want to learn more about the non-standard hyphention. Namely the Soija extension (ex. ab1cd/am=z2,2) and the NEXTLEVEL tag (along the others).
Just to be sure, I understand that the LEFTHYPHENMIN and RIGHTHYPHENMIN refers to word boundaries (here I use the B letter: BabcB, Bab-cB), and that COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN should refers to compound boundaries (here I use the C letter: CabcC, CabC-CcC). Am I right?
The NOHYPHEN tag describes the characters that should not be considered as hyphenation separators? like if I write NOHYPHEN -, then the '-' in ab-cd is not a separator? Here I don't understand the utility of this tag. I though I had to indicate the position of the hyphenation with an odd number, not tell the hyphenator what is not an hyphen.
Next, what I don't understand at all is the NEXTLEVEL tag.
From what I learn about the Soija extension, the indexes refers to the first position (1-based) where the substitution occurs and its length. That is, if I have the word "abcd" with the rule ab1cd/am=z2,2 then the hyphenation is aam-zd. I understand right?
If I didn't specify the index and length then I understand the whole rule is substituted. From the word "mabcdm" with the rule ab1cd/am=z, then the hyphenation will be mam-zm. Is it right?
Another question is about the paper in http://hunspell.sourceforge.net/tb87nemeth.pdf: here an example on page 3 show the rule eigh1teen/t=t,5,1. Shouldn't it be eigh1teen/ht=t,4,2?
Another one: can the first index of the Soija point to the hyphenation code? (like eigh1teen/t=t,4,1)
The rule 7-/=- shouldn't it be -7/-=-? OOWriter gives me an error if I tell him the rule 7-/-=-
Is there a paper somewhere where there is the description of the Soija non-standard hyphenation and the OO tags (NEXTLEVEL and the others)?

Thank you so much.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Hyphenation file format

Post by acknak »

Greetings and welcome to the community forum!

This is mainly a forum for OO users; you won't find too many developers here.

I hope someone here can give you some suggestions, but you may want to get in touch with the developers who work on OO language support. I'm not sure just how to do that, but you might try asking on the general developers list: dev@openoffice.org, or looking through comments (or submitting an issue) on the OO bugzilla. You may even have to go to the upstream hunspell developers.
AOO4/LO5 • Linux • Fedora 23
maurotrev
Posts: 5
Joined: Fri Aug 25, 2017 9:13 am

Re: Hyphenation file format

Post by maurotrev »

Thank you acknak!
OpenOffice 4.2 on Windows 10
Post Reply