Need urgent help to understand how to parse suffix files

Discussions about using 3rd party extension with OpenOffice.org
Post Reply
CeFurkan
Posts: 2
Joined: Thu Mar 02, 2017 11:15 pm

Need urgent help to understand how to parse suffix files

Post by CeFurkan »

Hello. I need to understand how spell checker parses suffix files

Especially the arabic one as it comes to me most complex one

It starts like this

FLAG long
AF 333
AF TbTc # 1
AF TbTcff # 2
AF TaTbTcTdTeTfThTiTjTkTlTmTnToTpTqTrTsTtTuTvTxTycc # 3
AF TbTcTdTeTf # 4
AF TbTcTe # 5

So what does AF mean?
What does TbTcTe mean?

AM الإضافية ####

What does AM mean?

IGNORE ًٌٍَُِّْـٰ

KEY ضصثقفغعهخحجد¦شسيبلاتنمكط¦ئءؤرﻻىةوزظ¦ضشئ¦صسء¦ثيؤ¦قبر¦فلﻻ¦غاى¦عتة¦هنو¦خمز¦حكظ¦جط

What does IGNORE Key do?

ICONV ﻼ لا
MAP ضص
REP ^هى$ هي

What does ICONV, MAP, REP mean?

PFX and SFX are exaplined here but still very poorly : http://www.openoffice.org/lingucomponent/affix.readme

Ok also for example how do i parse these

SFX AD وء وءه/309 وء
SFX BA 0 0/299 .

I mean it must have some generic rules to parse all these suffixes etc
Where can i find them?

here the arabic aff file : http://pastebin.com/KkdwBsH1

Here few examples from .dic file

تتجلطين/231
تتجلط/233
تتجلطان/240
يتجلطون/239
يتجلطا/237
تتجلطا/237
تجلطان/256
تجلطنا/232
نتجلطن/232
تجلطتا/230
تجلطن/262
تجلطي/256
أتجلط/243
يتجلطنان/230
تجلطتما/242
تتجلطوا/236
تتمحوران/240
تمحورت/230
OpenOffice 3.1 on Windows 8
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Need urgent help to understand how to parse suffix files

Post by RoryOF »

You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspe ... mentation/
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
CeFurkan
Posts: 2
Joined: Thu Mar 02, 2017 11:15 pm

Re: Need urgent help to understand how to parse suffix files

Post by CeFurkan »

RoryOF wrote:You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspe ... mentation/
i have been working on this since yesterday

hunspell has unmunch command but it fails for UTF8 dictionaries

are there any way to do this properly

which will read each line of the dictionary and generate all possible words that are determined by aff file
OpenOffice 3.1 on Windows 8
Post Reply