Creating banned word/filter lists from documents

Discuss the word processor
Post Reply
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Creating banned word/filter lists from documents

Post by williamfromlincs »

Could someone create an extension or build into Open Office something that could "select" all words in a document that were NOT in the dictionary. This is to create a list of banned words from emails or comments on websites. Most insults or profanity is not found in dictionaries and it would be an easy way to extract those words from emails and so on. Its really just selecting all the words that are underlined with the spell checker, then you could copy those to clipboard and paste into a list. Those lists can then be added to filters on Facebook and YouTube and email filters to block unwanted comments and emails etc.
Last edited by floris v on Fri Aug 23, 2019 11:57 am, edited 1 time in total.
Reason: Removed Solved icon for not yet solved topic, floris v, moderator
open office 4.1.6
Bill
Volunteer
Posts: 8934
Joined: Sat Nov 24, 2007 6:48 am

Re: Creating banned word/filter lists from documents

Post by Bill »

Deleted.
Last edited by Bill on Sat Aug 24, 2019 12:50 pm, edited 1 time in total.
AOO 4.1.14 on Ubuntu MATE 22.04
User avatar
keme
Volunteer
Posts: 3705
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Creating banned word/filter lists from documents

Post by keme »

Most insults, obscenity and profanity consist of perfectly valid words which will be listed in any official dictionary for a language, including those distributed with office suites.

Such a strategy has been tried before, both as a positive test (catch listed words) and negative (catch unlisted words). The former is vulnerable to intentional misspelling, and the latter to accidental typos. Either way it will inspire to creative language usage, so it is not all bad...
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

check out banned words lists from Youtube and and Facebook before you comment on everyone else being an idiot
open office 4.1.6
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

load up those lists in open office with the dictionaries and you shall see that is not the case at all. not by a long shot. Assuming you know everything before you talk makes you look stupid. But if you download those lists you can see most words are not in the dictionary at all. The profanity is slang mostly not covered at all by dictionaries so it would make a good part of a tool chain maybe, not 100% but it could speed up the process. You maybe got called a lot of names as a child keme which makes you lash out in that way today. put downs and scorns. looking to do that all that time perhaps. I have 3 Phd's keme. you are not so clever and i am not that idiot that you want everyone to be. Computers make peoples egos flop out all over the place huh.
open office 4.1.6
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Creating banned word/filter lists from documents

Post by RoryOF »

@williamfromlincs: Be polite! You are the one who has misinterpreted what keme said, you are the one who introduced the term "idiot" into the discussion and departed from subject on an unwarranted personal attack. If you have three PhDs you presumably learned about "the decencies of debate"; apply those standards to your replies or you will be banned.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

implied though, it was a lack of researching the matter that caused the comment off the cuff, ignorance was bliss until the burn he got back
open office 4.1.6
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

not being drunk on the smart phone or laptop and insulting people over the net would be a better way to stop the insults on all the websites. Which could be why these are not found in the dictionary. plus many are just slang which seem not to be covered in dictionaries. I spend a few days looking through lists of banned words from major social media websites and saw the pattern of them not actually being real words found in the open office dictionary "before" I asked the question. I assumed nothing where others are doing.
open office 4.1.6
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

To take a step backward and think more of the issue. A good starting point would be to create two sets of dictionaries one with curse words and profanity and another dictionary without all those curse words in it. Which then would reveal the common swear words also when constructing "banned words lists". Then to extrapolate from insulting emails and such all of those words which need to be on the list. But it's only a part of a toolchain to speed it up a little. You do have to manually go through the lists as there are many words that would create false positives that appear. (this is true on the official lists constructed by Facebook and Youtube that are available for download btw). But the constructing this within Open Office does give the user a greater variety in languages as there are many dictionaries available. (which is why I thought it would be a good idea to ask Open Office to have such a facility to "select" words that were highlighted by the spellchecker). Its a sound idea in principle.
open office 4.1.6
User avatar
robleyd
Moderator
Posts: 5087
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: Creating banned word/filter lists from documents

Post by robleyd »

Have you actually looked at the contents of the dictionaries used by OpenOffice? I did earlier today and found they do contain many of the expressions you might want to use in a ban list.

I find a number of variations on f__k, the c word, poop, sh_t, bast_rd, c_ck, d_ck, t_rd and many others.

Your attitude does not meet the standards of behaviour we expect here. If you continue to behave in this manner, you will find yourself banned.
Cheers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.2.2; SlackBuild for 24.2.2 by Eric Hameleers
User avatar
keme
Volunteer
Posts: 3705
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Creating banned word/filter lists from documents

Post by keme »

For the record: No name calling was intended or implied from me.

My comments were merely intended to point out that there are challenges:
  • Warning that making a filter which works in a sensible way requires considerable resources (many have tried and failed)
  • Explaining why I believe that using the builtin dictionaries (which was the plan, as far as I could understand) is not likely to bear fruit.
The final "punchline" about creative language was only meant to lighten up the mood, but I realize now that it may be taken as a snide remark.

I apologize that I posted a response which could be construed as offensive towards the original poster. I cannot promise to do better next time, but I will try.
williamfromlincs wrote:You maybe got called a lot of names as a child keme ...
Yes I did.
williamfromlincs wrote:... which makes you lash out in that way today. put downs and scorns.
I don't believe I do. If you find that such is the case, I would be grateful for help to do better, because I honestly and consciously try not to. Could you point out some examples (aside the one I already apologized for)?
williamfromlincs wrote: I have 3 Phd's keme
Then you are trained in doing research, and might do good to take your own advice before making comments about someone's personality. My screen name is in use here, as well as in a few other forums out there.
williamfromlincs wrote:Computers make peoples egos flop out all over the place huh.
You are right about that, it seems. Perhaps more so than you realize...
williamfromlincs
Posts: 7
Joined: Fri Aug 23, 2019 10:18 am

Re: Creating banned word/filter lists from documents

Post by williamfromlincs »

thanks keme, what a nice reply. Everyone has over corrective parents and that leads children to taunt (or over correct, complain about shoes and shabby clothing etc) one another at school. Its normal psychology. Unless your homeschooled and that is a much nicer childhood experience. The taunting leads many academics into what I call the Sheldon effect. Much like Sheldon Cooper from the TV show The BigBang Theory. where everyone one else is stupid and he is the genius. Sorry for that blurt out.
I do have a lot of researchers working in abnormal psychology working on various tasks which is sort of a reason for the banned word lists, they have webservers full of insulting emails against many different ethnic groups (thousands per website as you can imagine). Which makes for a great resource for many banned word lists as they are in many languages and cover many ethnic groups globally. Which if processed would be a good asset for such places as Google and Facebook and many other websites. Although those need to be processed to make any use of them all. I will have to search for a tool that enables me to strip out the body of an email and paste it into text. (perhaps processing in bulk).
take care
open office 4.1.6
User avatar
keme
Volunteer
Posts: 3705
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Creating banned word/filter lists from documents

Post by keme »

Right! Then you have available those "considerable resources" I referred to above, and just need a tool for them to work with. Beginning with banned words and continuing to catch the unlisted, with the human mind "running the show". That should work. Are we barking up the same alley now?

I am not well versed in the inner workings of Writer, nor the internal data structure, so writing up some plugin for the task is beyond my capacity. However, with the background info now given, there should be someone around here who can build something useable.

I looked into applying concordance files for this purpose, but it requires too much fiddling and the result is not as useful as I imagined.
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Creating banned word/filter lists from documents

Post by RoryOF »

Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Post Reply