Supporting autocomplete with n-gram

Abstract

How to create autocomplete functionality for search input.

Sitecore provides an n-gram analyzer for Lucene.net (Sitecore.ContentSearch.LuceneProvider.Analyzers). If you use Solr, you can set this up in the Solr Schema.xml file.

You use the n-gram analyzer to create autocomplete functionality for search input. The analyzer breaks tokens up into unigrams, bigrams, trigrams, and so on. When a user types a word, the n-gram analyzer looks the word up in different positions, using the tokens that it generated.

You add support for autocomplete by adding a new field to the index and mapping this field to use the n-gram analyzer instead of the default. When you run the LINQ query to query that field, use the following code:

using (IProviderSearchContext context = Index.CreateSearchContext())
            {
                result = context.GetQueryable<SearchResultItem>().
                    .Where(i => i.Name.StartsWith(“some”))
                    .Take(20)
                    .ToList();
            }

Sitecore provides an implementation that uses trigrams and a set of English stop words. If you have other requirements, you can build a new analyzer and change these settings.