Page 1 of 1

Need urgent help to understand how to parse suffix files

PostPosted: Thu Mar 02, 2017 11:30 pm
by CeFurkan
Hello. I need to understand how spell checker parses suffix files

Especially the arabic one as it comes to me most complex one

It starts like this

FLAG long
AF 333
AF TbTc # 1
AF TbTcff # 2
AF TaTbTcTdTeTfThTiTjTkTlTmTnToTpTqTrTsTtTuTvTxTycc # 3
AF TbTcTdTeTf # 4
AF TbTcTe # 5

So what does AF mean?
What does TbTcTe mean?

AM الإضافية ####

What does AM mean?

IGNORE ًٌٍَُِّْـٰ

KEY ضصثقفغعهخحجد¦شسيبلاتنمكط¦ئءؤرﻻىةوزظ¦ضشئ¦صسء¦ثيؤ¦قبر¦فلﻻ¦غاى¦عتة¦هنو¦خمز¦حكظ¦جط

What does IGNORE Key do?

ICONV ﻼ لا
MAP ضص
REP ^هى$ هي

What does ICONV, MAP, REP mean?

PFX and SFX are exaplined here but still very poorly : http://www.openoffice.org/lingucomponent/affix.readme

Ok also for example how do i parse these

SFX AD وء وءه/309 وء
SFX BA 0 0/299 .

I mean it must have some generic rules to parse all these suffixes etc
Where can i find them?

here the arabic aff file : http://pastebin.com/KkdwBsH1

Here few examples from .dic file

تتجلطين/231
تتجلط/233
تتجلطان/240
يتجلطون/239
يتجلطا/237
تتجلطا/237
تجلطان/256
تجلطنا/232
نتجلطن/232
تجلطتا/230
تجلطن/262
تجلطي/256
أتجلط/243
يتجلطنان/230
تجلطتما/242
تتجلطوا/236
تتمحوران/240
تمحورت/230

Re: Need urgent help to understand how to parse suffix files

PostPosted: Thu Mar 02, 2017 11:32 pm
by RoryOF
You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/

Re: Need urgent help to understand how to parse suffix files

PostPosted: Fri Mar 03, 2017 7:06 pm
by CeFurkan
RoryOF wrote:You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/


i have been working on this since yesterday

hunspell has unmunch command but it fails for UTF8 dictionaries

are there any way to do this properly

which will read each line of the dictionary and generate all possible words that are determined by aff file