Thursday, January 14, 2010

[.net] Using sql server full text searching with Mindscape's LightSpeed

Advance warning: This post is only of use to people using Lightspeed or suffering from insomnia.

The problem
For my current c# project, I am using Mindscape's Lightspeed O/R framework. Lightspeed is a mighty fine product that makes object persistence a breeze. I would kill to have this in delphi.

However the only full text searching that LightSpeed supports out of the box is Lucene.net. Sql Server full text searching is not provided and no-one else seems to have created it.

The good news is that Lightspeed has a pluggable search engine framework so it is possible to add a new search engine. The bad news is that it's not overly well documented and there aren't many examples. The good news is that their online support is pretty good.

The resolution
Mindscape have a blog post outlining the process of adding a new search engine. However there are a number of traps for young players that aren't covered there, or indeed anywhere. The following should shed some light on the process.

Creating a new search engine requires implimenting ISearchEngine
  public interface ISearchEngine
{
IList<SearchResult> Search(string query, params string[] scopes);
void Add(IndexKey indexKey, string data);
void Update(IndexKey indexKey, string data);
void Remove(IndexKey indexKey);
void Optimize();
void Clear();
LightSpeedContext Context { get; set; }
void BeginBulkAdd();
void EndBulkAdd();
}
For sql server, and for most databases with inbuilt FTS, most of these methods can just be left as empty method bodies. All that needs to be implemented is Context (stores lightspeed configuration info such as the database connection string) and Search.

All that is needed to implement Context is:
    public LightSpeedContext Context { get; set; }

The fun comes in implementing Search:).

Example class
LightSpeed takes a class and maps it to a database table. In my case, the class is CmsModels.Entities.AccClaim. It is stored in the table AccData and has an integer primary key called ClaimId.

To perform a search, say finding all of my claims, I need to run the following query:
    select *
from FreeTextTable(AccData, *, 'Sean Cross Prevshort')
order by RANK desc
returning something like
    KEY       RANK
175671 27
175673 23
175646 15
175657 7
Key is the pk field (ClaimId) and Rank is how well the record matched the query.

Implimenting search
The Search method takes a query ("Sean Cross Hastings") and an array of type names (["CmsModels.Entities.AccClaim", ...]). It returns a list of SearchResult objects. SearchResult has the following constructor:

    public SearchResult(
string key,
string scope,
string entityId,
float score
)

In short, we take the query and the class name, construct a query, run it and use the results to create a list of SearchResult objects. Easy, in theory.

The devilish details
A Getting the table name
Given an object name, Lightspeed has a number of ways of determining the table name. Which option is chosen depends on the model and class settings.
The possibilities are
  1. Table name = class name (eg AccClaim)
  2. Table name = pluralised class name (eg AccClaims)
  3. Table name is explicitly set in class attributes (eg [Table("AccData", IdColumnName="ClaimID")])
  4. Table name is set programmaticly using a user defined naming strategy object (eg TBL_ACCCLAIM)
AFAIAA It is not possible to cover all these cases without access to LightSpeed internals. I have suggested that they provide a GetMyBloodyTableName method but that hasn't happened yet.

I have opted to only support cases 1 and 3 as this covers most of my classes, and it's easy. Any table name that falls into a different case can still be searched by specifying the table name explicitly.

To get the table name, I look for a TableAttribute and use it. If I can't find one, I use the type name.
    string tableName = "";
Type t = Type.GetType(scope);
object[] tableAttrib = t.GetCustomAttributes(typeof(TableAttribute), false);
if (tableAttrib.Length > 0)
{
tableName = ((TableAttribute)tableAttrib[0]).Name;
}
else
{
tableName = t.Name.ToUpper();
}
B Running the query
LightSpeed doesn't support running random sql so it's back to ado.net. The Context object can be used to create a SqlCommand object so it's pretty straightforward:

    IDbCommand command = Context.DataProviderObjectFactory.CreateCommand();           
Context.CreateUnitOfWork().PrepareCommand(command)
command.CommandText = string.Format("select * from FreeTextTable([{0}], *, @query) order by RANK desc", tableName);
((SqlCommand)command).Parameters.AddWithValue("@query", query);
IDataReader reader = command.ExecuteReader();

while (reader.Read())
{
...
}
C Creating the SearchResult objects
The SearchResult constructor is as follows:
    public SearchResult(
string key,
string scope,
string entityId,
float score
)
scope is the full class name passed into the Search method (eg "CmsModels.Entities.AccClaim")
entityId is the pk returned from the query in the Key field (eg 175671)
score is the rank returned from the query in the Rank field (eg 27)

key is the problem child. It is not properly explained anywhere in the documentation. I only found out how it is created by creating a dummy Add method, putting a breakpoint in there and examining the values passed in. It is a string in the format "{type name}]{entityId}" (eg "[AccClaim]175671").

So creating the results is
    while (reader.Read())
{
string entityId = reader[0].ToString();
int rank = reader.GetInt32(1);
string srKey = string.Format("[{0}]{1}", t.Name, entityId);
var sr = new SearchResult(srKey, scope, entityId, rank);
result.Add(sr);
}
Use the source Luke
The full source code is here. This is "works on my machine" quality code, so use with care. In particular:
  1. It doesn't handle pluralisation or naming strategies
  2. I cast directly to SqlCommand without testing it. Works for me but..
  3. It's not solidly tested. I only use it for one table so it may do funny things with other table structures.
  4. While running the code for the first time, the office power failed and the entire building went dark. I'm not saying it's related, but if it happens to you, you are on your own.
Things for MindScape to do
Provide a GetMyBloodyTableName and GetMyBloodyKey helper methods. Pretty please.