Searched word must not contain more than 14 characters

Let us know how we are doing -
Post Reply
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Searched word must not contain more than 14 characters

Post by Villeroy »

Quick search result wrote:The following words in your search query were ignored because they are too common words: gettransferable.
You must specify at least one word to search for. Each word must consist of at least 3 characters and must not contain more than 14 characters excluding wildcards.
The word has 15 characters, indeed. Same with "advanced search". So we can not search for names of services, interfaces, and methods :?
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Searched word must not contain more than 14 characters

Post by TerryE »

OK, a job for tomorrow. It's been a long day.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Searched word must not contain more than 14 characters

Post by TerryE »

OK I've had a look at the code. phpBB allows you to implement one of a variety of full text search engines. The two "out of the box" ones are "Fulltext Native" and "Fulltext MySql", which since we are using PostgreSQL limits us to the former. Administrators can access the ACP->General->Search settings tab to set 3 Admin tunable parameters:
  • Min characters indexed by search. (3) Words with at least this many characters will be indexed for searching.
  • Max characters indexed by search. (14) Words with no more than this many characters will be indexed for searching.
  • Common word threshold. (66%) Words which are contained in a greater percentage of all posts will be regarded as common. Common words are ignored in search queries. Set to zero to disable. Only takes effect if there are more than 100 posts.
So the 14 limit is one that we imposed. Moreover it only seems to be enforced on the search query because the following query shows that the full words are actually catalogued:

Code: Select all

en=>  select char_length(word_text), count(*) from phpbb_en_search_wordlist group by 1;
 char_length | count
-------------+-------
           1 |     9
           3 |  2655
           4 |  3719
           5 |  4281
           6 |  4209
           7 |  4210
           8 |  3974
           9 |  3284
          10 |  2514
          11 |  1739
          12 |  1021
          13 |   636
          14 |   378
          15 |   238
          16 |   150
          17 |    91
          18 |    84
          19 |    35
          20 |    31
          21 |    14
          22 |    21
          23 |    14
          24 |    13
          25 |    11
          26 |     1
          27 |     4
          28 |     3
          29 |     3
          30 |     4
          31 |     3
          32 |    10
          33 |     1
          37 |     1
          40 |     3
          43 |     1
          47 |     1
          48 |     1
          51 |     1
(38 rows)
Need to talk to Drew re this one.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Searched word must not contain more than 14 characters

Post by TerryE »

Villeroy, our search algorithms currently use a phpBB based full text search algo rather than using the D/B FULLTEXT search feature. The reason for this is simple: though this has been available for ome time in MySQL, it was only introduced in PostgreSQL in version 8.3. We are currently running on 8.2.

Drew is planning to upgrade the forum D/B in the near future. When he does then I suggest we move over to D/B FULLTEXT search as this is quite a bit more efficient. In the meantime I will put testing this out on our dev configuration on my todo list. I will deploy it into prod and address this issue following the upgrade.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
User avatar
DrewJensen
Volunteer
Posts: 1734
Joined: Sat Oct 06, 2007 9:01 pm
Location: Cumberland, MD - USA

Re: Searched word must not contain more than 14 characters

Post by DrewJensen »

HI,

Of course we could increase the max size of indexed words now - there is no reason really to leave the limit at 14. It would require disabling the search feature for some period of time while the word index table was rebuilt - I have rebuilt the index table once since the board went live - about 3 weeks afterwards - at that time it took about 5 minutes, with the current settings. Now, I don't know maybe 30 minutes, maybe less...I could try it locally and see.

As for full-text search engines - You are referring to the use of TSearch under PostgreSQL ( actually it has been available as an addon since 7.1 IIRC and is just included in the base package as of 8.3 ). Using this is not directly supported by phpBB. What is available is a search module, based on TSearch, written by the site administrator at the PostgreSQL web forum. ( Not sure what version of phpBB he is up to currently, last I checked he had not even upgraded to the 3.0 Gold release ) So, we can count having to do some php coding I think - more then that however is the work we will need to do for different native languages. Part of what is required is a STOP WORD dictionary, I know that a base dictionary can he had for English and I think French - Hungarian, I don't know about. Actually not sure on French either - I'll make a point to go check. ( I do know that Russian is available... :shock: )

Please also take a look at this post: http://user.services.openoffice.org/en/ ... rch#p22554 for another idea.
Former member of The Document Foundation
Former member of Apache OpenOffice PMC
LibreOffice on Ubuntu 18.04
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Searched word must not contain more than 14 characters

Post by TerryE »

Drew, my point to Villeroy is that we should be sorting out what we do with FULLTEXT search within a month or so, so there's little point upping the max limit from 14 characters now as this will involve taking the forum offline for the 3 hours or so needed to rebuild the search indexes.

In terms of D/B fulltext search, I can't see a major issue here as phpBB already supports "Fulltext MySQL" which is really "Fulltext Database". The syntax differences between MySQL and PostgreSQL FULLTEXT are so small that tweaking this latter option to support PostgreSQL also should be fairly straight forward. And I agree with you re the use of an external search API. What I am not sure about is using this as a substitute rather than an alternative.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
User avatar
DrewJensen
Volunteer
Posts: 1734
Joined: Sat Oct 06, 2007 9:01 pm
Location: Cumberland, MD - USA

Re: Searched word must not contain more than 14 characters

Post by DrewJensen »

Really the bottom line on this then is to get that test bed VM moved over this weekend ( which I know you are planning on) and then actually getting our hands dirty with it.

I haven't touched base with Gerd on the DB upgrade yet - but I will make a point to get an email off to him this weekend and start that ball rolling...My goal is still to have it fully moved over by June 15th. That should give us enough time to settle on a search engine I think.
Former member of The Document Foundation
Former member of Apache OpenOffice PMC
LibreOffice on Ubuntu 18.04
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Searched word must not contain more than 14 characters

Post by TerryE »

+1
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
Post Reply