josh blog search

newest | archives | search | wishlist | email

The Josh Blog search engine works by doing a fulltext search on two fields of the database that runs everything. One is the field containing the entries you usually read, and one is a reference field that you don't normally see. Ideally this means that you can find entries pertaining to, say, the Dismemberment Plan, which don't actually mention the Plan in the entry text - that information is hidden away in the references. Unfortunately, I haven't been keeping good references since Josh Blog moved to ellipsis. In the coming month I hope to go back and put in at least some of the obvious references so that the search is more useful.

Remember that currently this only searches on entries made since August 2001. Moving entries from my old archives into the newfangled database is a long-term project.

Because of the way the search works, it may return some entries which seem less than perfectly relevant. Consider this a feature: by searching you'll also grab other interesting entries which may be tangentially related (at best).

The minimum word length currently searched on is 4 characters. This may change to 3 to include words like "rap".

The following information about search options and example searches was stolen from the MySQL documentation.

Boolean fulltext search supports the following operators:

+
A plus sign prepended to a word indicates that this word must be present in every hit.
-
A minus sign prepended to a word indicates that this word must not be present in the hits.
By default - without plus or minus - the word is optional, but the entries that contain it will be rated higher.
< >
These two operators are used to increase and decrease word's contribution to the relevance value, assigned to a hit. See an example below.
( )
Parentheses are used - as usual - to group words into subexpressions.
~
This is negation operator. It makes word's contribution to the entry's relevance negative. It's useful for marking noise words. An entry that has such a word will be rated lower than others, but will not be excluded altogether, as with - operator.
*
This is truncation operator. Unlike others it should be appended to the word, not prepended.

And here are some examples:

apple banana
find rows that contain at least one of these words.
+apple +juice
... both words
+apple macintosh
... word ``apple'', but rank it higher if it also contain ``macintosh''
+apple -macintosh
... word ``apple'' but not ``macintosh''
+apple +(>pie <strudel)
... ``apple'' and ``pie'', or ``apple'' and ``strudel'' (in any order), but rank ``apple pie'' higher than ``apple strudel''.
apple*
... ``apple'', ``apples'', ``applesauce'', and ``applet''