Need urgent help to understand how to parse suffix files

Discussions about using 3rd party extension with OpenOffice.org

Need urgent help to understand how to parse suffix files

Postby CeFurkan » Thu Mar 02, 2017 11:30 pm

Hello. I need to understand how spell checker parses suffix files

Especially the arabic one as it comes to me most complex one

It starts like this

FLAG long
AF 333
AF TbTc # 1
AF TbTcff # 2
AF TaTbTcTdTeTfThTiTjTkTlTmTnToTpTqTrTsTtTuTvTxTycc # 3
AF TbTcTdTeTf # 4
AF TbTcTe # 5

So what does AF mean?
What does TbTcTe mean?

AM الإضافية ####

What does AM mean?

IGNORE ًٌٍَُِّْـٰ

KEY ضصثقفغعهخحجد¦شسيبلاتنمكط¦ئءؤرﻻىةوزظ¦ضشئ¦صسء¦ثيؤ¦قبر¦فلﻻ¦غاى¦عتة¦هنو¦خمز¦حكظ¦جط

What does IGNORE Key do?

ICONV ﻼ لا
MAP ضص
REP ^هى$ هي

What does ICONV, MAP, REP mean?

PFX and SFX are exaplined here but still very poorly : http://www.openoffice.org/lingucomponent/affix.readme

Ok also for example how do i parse these

SFX AD وء وءه/309 وء
SFX BA 0 0/299 .

I mean it must have some generic rules to parse all these suffixes etc
Where can i find them?

here the arabic aff file : http://pastebin.com/KkdwBsH1

Here few examples from .dic file

تتجلطين/231
تتجلط/233
تتجلطان/240
يتجلطون/239
يتجلطا/237
تتجلطا/237
تجلطان/256
تجلطنا/232
نتجلطن/232
تجلطتا/230
تجلطن/262
تجلطي/256
أتجلط/243
يتجلطنان/230
تجلطتما/242
تتجلطوا/236
تتمحوران/240
تمحورت/230
OpenOffice 3.1 on Windows 8
CeFurkan
 
Posts: 2
Joined: Thu Mar 02, 2017 11:15 pm

Re: Need urgent help to understand how to parse suffix files

Postby RoryOF » Thu Mar 02, 2017 11:32 pm

You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/
Apache OpenOffice 4.1.7 on Xubuntu 18.04.3 (mostly 64 bit version) and very infrequently on Win2K/XP
User avatar
RoryOF
Moderator
 
Posts: 29608
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Need urgent help to understand how to parse suffix files

Postby CeFurkan » Fri Mar 03, 2017 7:06 pm

RoryOF wrote:You should consult the Hunspell documentation at
https://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/


i have been working on this since yesterday

hunspell has unmunch command but it fails for UTF8 dictionaries

are there any way to do this properly

which will read each line of the dictionary and generate all possible words that are determined by aff file
OpenOffice 3.1 on Windows 8
CeFurkan
 
Posts: 2
Joined: Thu Mar 02, 2017 11:15 pm


Return to Extensions

Who is online

Users browsing this forum: No registered users and 1 guest