DISCLAIMER: I’m new to SQL Server Full Text Search, so if what I’m saying here is wrong then let me know.
Sticking with the Knowledge Base project, we had to implement search. We decided to use SQL Server Full Text Search for this. Once the fulltext catalogs were setup, the query my colleague wrote looked like this:
SELECT [Key], [Rank] FROM FREETEXTTABLE(dbo.Question, QuestionText, 'building consent form', 100)
The problem with this search is that it will return results which contain any one of the keywords. So if a user enters “building consent form”, a page only needs to contain building OR consent OR form and it will be included in the results. I call this an OR search. This reminds me of the early days of search engines, (AltaVista.com et al), which used to work the same way.
Back in the AltaVista days, if you wanted only pages with every search term you had to use AND between each search term e.g. [building AND consent AND form]. Nowadays you don’t need to type the ANDs; if you enter [building consent form] into Google, it will only return results which contain building, consent, and form. This is what I call an AND search.
SQL Server Full Text Search seems to be stuck in the late 1990s days of OR searches. As demonstrated above it will return a hit for any one of the search words. And there doesn’t seem to be an easy way to tell it to do an AND search.
To do an AND search with SQL Server Full Text Search, you need to query the CONTAINSTABLE.
SELECT [Key], [Rank], 1 FROM CONTAINSTABLE(dbo.Question, QuestionText, '"building" AND "consent" AND "form"', 100)
Because we’re querying the CONTAINSTABLE and not the FREETEXTTABLE we lose all Full Text Search’s nice fuzziness around the search terms, i.e. in the above example, Questions which contain “buildings”, “consent”, and “form” won’t be returned. To help we can do the following:
SELECT [Key], [Rank], 1 FROM CONTAINSTABLE(dbo.Question, QuestionText, '"building*" AND "consent*" AND "form*"', 100)
Stop right there
Another problem with this is stopwords, aka noise words (SQL Server 2005 calls them noise words, 2008 stopwords). SQL Server has a list of around 150 common words, which are ignored by the Full Text Search engine. e.g. about, after, all, also, an, and, another etc. You can view the English stopwords in SQL Server 2008 like so:
select * from sys.fulltext_system_stopwords where language_id = 1033
In my table there is a Question titled “What are building consents?”. If we search for “what are building consents” like so:
SELECT [Key], [Rank], 1 FROM CONTAINSTABLE(dbo.Question, QuestionText, '"what*" AND "are*" AND "building*" AND "consents*"', 100)
because “are” is a stopword (and so is “what”) we won’t get any results!
There is a workaround – to wipe the list of stop words! The problem with doing this is that the list of stop words is setup server-wide on the SQL Server itself. In our production environment other applications are using the same SQL Server instance (but of course we have our own KnowledgeBase database), so this wasn’t an option.
Then the customer added a new requirement – if a Question contains the exact search phrase then that should be returned first. Adding the code for that use case fixes my search.
CREATE PROCEDURE dbo.Search
@keywords NVARCHAR(4000), — e.g. ‘”building*” AND “consent*” AND “form*”‘
@searchPhrase NVARCHAR(4000) — e.g. ‘building consent form’
— Tmp table —
DECLARE @searchResults TABLE
ExactMatchToQuestionText bit default 0
— Question contains keywords in order —
INSERT @searchResults (QuestionID, ExactMatchToQuestionText)
SELECT QuestionID, 1
WHERE QuestionText LIKE ‘%’ + @searchPhrase + ‘%’
— Free Text Search matches —
INSERT @searchResults (QuestionID, Rank, ExactMatchToQuestionText)
SELECT [Key], [Rank], 0
FROM CONTAINSTABLE(dbo.Question, QuestionText, @keywords, 100)
AND [Key] not in (SELECT QuestionID from @searchResults)
— Return —
FROM @searchResults [Results]
JOIN Question ON Results.QuestionID = Question.QuestionID
This code still has a bug – a search for [what are building consents] returns 1 hit, but a search for [what building consents] returns 0, because “what” is a SQL Server stopword. D’oh!
At this point I am better off throwing away SQL Full Text Search and completely rolling my own query.
So why doesn’t SQL Server Full Text Search include a simple option for doing AND searches against the FREETEXTSEARCH table?
And, does anyone know a better way to do AND searches using SQL Server Full Text Search?