Page 1 of 1

Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 4:34 am
by Bill
Is it possible to ignore common words only when they are used singly but not ignore them when used in combination with at least one other word? Words such as "file", "page", "format", "change", "icons", "download", "openoffice", "open" and "save" are all ignored when searching, making it impossible to search for terms like "file format", "page format", "change icons", "download openoffice", "open file" and "save file".

Re: Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 1:03 pm
by DrewJensen
I need to look at that one - I'll get back to you

Re: Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 11:18 pm
by Hagar Delest
+1.
I tried first "spell check" and got that message also ! rather frustrating. Better completely deactivate that feature.

Re: Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 11:28 pm
by DrewJensen
Ok - at this moment I can disable common word exclusion completely or enable it completely.

I can also change how the system determines which are common words:

This is how it works now.

I set a percentage that is used during word indexing, this is used as a threshold value, when a word is found in more then this percentage of posts it is considered common - right now the threshold is set, IMO, much too low, 5%. This was the default setting -

So - what I am going to do is set this higher - say 33 percent. I believe that I need to also drop and re-generate the search index after I do this...not sure on that.

Any other ideas on that - higher percentage - disable all together...

As for making it so that common usage words are significant if used as part of a word group - that would require a major update to the search function - not out of the question, but if we can get the results we want without going that route I think it would be better - this function has been fairly widely tested by the developers and if we right our own the testing is on us, as is any hope for support from the phpBB team if we have a problem down the road.

Caveat to that - there might be a situation where we want to touch this routine ( set of routines ) anyway and that is if we setup multiple forums in multiple native languages and then want to have a search that goes across forum boundaries - if that ends up being something we want to do then we can always look at fitting in both features.

EDIT - the criteria is not set at 33%


EIDt 2 - well just changing that did not seem to effect the search results I still got the same non-responsive result as the others..will look at rebuilding the search index now.

Re: Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 11:33 pm
by Hagar Delest
Hmm, still can't search for 'spell check' :?

Re: Forum Search Ignores Common Words

PostPosted: Fri Nov 30, 2007 11:40 pm
by DrewJensen
alright just ran a test on the test installation

I will need to do two things for this change to take effect.

One - I will need to clear the ACP cache on the httpd server ( this means a soft refresh of the service ) and this will drop session information for anyone currently on the system - but for most this will be unnoticed as the next page load gets a new session id - someone posting at the moment I do it will have to repost - but should not lose any information

Two - I will need to drop the word index and rebuild it - took about 4 seconds on the test installation with the smaller number of posts and only myself logged on...so might be a few seconds longer here - during this brief time searching should be disabled.

EDIT - OK- final thought on this for them moment.

Looking at my list of things to do there are two other small changes to make in the configuration files - both require the a soft restart of the apache2 server - also I noticed both on OOoforum and here so far that the least active time seems to be late on a Friday night and early on Sunday morning...hmmm...even geeks have a social life it seems...LOL..

So - unless anyone objects I will make the edits to the config file now, but wait until late ( around midnight my time ) tonight to:
Run a backup
bounce the apache2 server
Drop the search index
Rebuild it
run vacuumdb on the database ( reclaim unused pages that will be left over from the index rebuild )

Then we can run some tests on common words as above. If we still don't like the results we can move to either up the percentage again.

Re: Forum Search Ignores Common Words

PostPosted: Sat Dec 01, 2007 4:11 pm
by DrewJensen
I made the change last night and have been running some test on word searches - it seems to be picking up words the way we would want now.. this is something to keep an eye on however....not sure how we do that is the problem.

Re: Forum Search Ignores Common Words

PostPosted: Sat Dec 01, 2007 4:38 pm
by Hagar Delest
Thanks, works like a charm now.

BTW, (if I may), is it possible to configure the search feature to open a new tab (or new window depending on the browser settings) so that it doesn't use the existing page ? When I need to search the forum to find an information to be linked in a thread, I lose the current page.

Re: Forum Search Ignores Common Words

PostPosted: Sat Dec 01, 2007 4:43 pm
by DrewJensen
ah - well for 'advanced search' link..sure - right click - Open in new tab ( window )... :twisted:

OK - so much for the smart alick reply...

For the search boxes - I suppose that could be done, the question is if that would be the default behavior that most would want - the next step then would be to make it a configuration item in the user control panel..that could be done also, a bit more work.

Re: Forum Search Ignores Common Words

PostPosted: Sat Dec 01, 2007 4:49 pm
by Hagar Delest
DrewJensen wrote:ah - well for 'advanced search' link..sure - right click - Open in new tab ( window )... :twisted:

:D
Of course, my middle click is used to do it now !! It's just a bell and whistle if it can be done, no big deal if it's too much work !

Re: Forum Search Ignores Common Words

PostPosted: Sat Dec 01, 2007 6:56 pm
by Bill
DrewJensen wrote:I made the change last night and have been running some test on word searches - it seems to be picking up words the way we would want now.. this is something to keep an eye on however....not sure how we do that is the problem.

Thanks. I haven't had any keywords ignored since the change was made.