Full text search in Microsoft's Entity Framework

Jun 21
Full text search in Microsoft's Entity Framework

Here it is, our first Blog post. Since I am the author (I guess my next post will be about who I am), it will have a fairly nerdy twist to it. The technology suite I am working on at the moment is heavily dependent on the full text indexing capabilities of Microsoft SQL server. Essentially content is stored with lots of unstructured text as metadata. Users are allowed to persist search queries simply by saving their search text. That would be all well and good, except that I have slowly become a fan of Microsoft's Entity Framework. I say slowly because it did not click with me until the code-first model was released. I have since become a fan and used it extensively in customer and product projects. In this particular case the Entity Framework  lets me down, there are several workarounds out there but none of them are truly integrated composeable query systems that fully embrace the FREETEXT / CONTAINS technology. You can query directly of course with a SqlQuery but that has some significant limitations, mostly because if the entity framework does not generate the SQL it can't process all the relationships that the data model knows about. Perhaps a more specific example will show you what I mean. Given a working Entity named User with a DbSet<User> named Users that has a complex type of named PhoneNumber (four string columns) , and given another Entity named Group with no complex types that is joined in a many-to-many relationship to Users (Members and MemberOf). In that scenario you have three conditions that require extra work to execute a FREETEXT query (or a CONTAINS query). In the first (the User) these queries (which look good) will both fail.

var dbSetTest = secCtx.Users.SqlQuery("SELECT * FROM USERS WHERE FREETEXT(*,'civil society')");

var dbSetTest =  var dbCtxTest =  ecCtx.Database.SqlQuery<User>("SELECT * FROM USERS WHERE FREETEXT(*,'civil society')").AsQueryable().Include("EmailAddress").Include("MemberOf");

the error is due to the ComplexType column for phone numbers the error is something like this "Message=Cannot create a value for property 'PhoneNumber' of type 'EntityTesting.PhoneNumber'. Only properties of primitive or enumeration types are supported." Of course the Group entity is flatter so you would think that this query would be ok

var dbSetTest = secCtx.Groups.SqlQuery("SELECT * FROM GROUPS WHERE FREETEXT(*,'group society')").AsQueryable().Include("Members").Include("EmailAddress");

That query does execute but because the SQL was provided and not built by the Entity framework, none of the Includes are processed.  So the next thing you say is hey, let's make two queries , pull back the identity value in the FREETEXT and use that in a clause to build the fully hydrated query. The first query looks like this

var dbsetGroupIds= secCtx.Groups.SqlQuery("SELECT GroupId FROM GROUPS WHERE FREETEXT(*,'group society')").Select(gp => gp.GroupId).ToList();

And it fails with a "The data reader is incompatible with the specified ..." error, again that's because the data reader is expecting a Group with a GroupId, not just a GroupId. So now you end up with these two steps.

var dbsetGroupIds= secCtx.Groups.SqlQuery("SELECT * FROM GROUPS WHERE FREETEXT(*,'group society')").Select(gp => gp.GroupId).ToList();

var dbSetTest = secCtx.Groups.Where(gp => dbsetGroupIds.Contains(gp.GroupId)).Include("EmailAddress").Include("Members");

and this works, but is horribly inefficient because you had to pull back the entire Group instead of just the GroupId, then throw that away to build the next data set.

What you want to do is something like this:

var freeTextUsers = secCtx.Users.FreeTextSearch(secCtx, "civil society").Include("EmailAddress").Include("MemberOf");

or this

var freeTextUsers = secCtx.Users.FreeTextContains(secCtx, "civil near society").Include("EmailAddress").Include("MemberOf");

That is exactly what this technology does. It is an IQueryable extension that executes the optimized query sequence for the provided free text (both in FREETEXT and in CONTAINS syntax), and then constructs and appends appropriate expression tree to the base object such that the results are restricted to the items that meet the free text query parameters. We do need to make a few assumptions, well only one assumption and that' is that the Table that stores the Entity has at least one full text catalog on it, and that all the columns you are interested in are in that full text catalog. Sounds silly that we need to state "for the technology to work the technology needs to be enabled" but we do,because that requires that the Entity have a single unique Key on it. This is a restriction of the full text catalog technology in Microsoft SQL server, any table in the full text catalog must have a single column unique primary key. If you try to create a full text catalog on or for a table that has a composite key, the server throws an error similar to this "XXXXX is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable,  single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed  column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key." For this extension to work, you need an Entity that has a key that meets the requirements, which means a number, an guid or a character string. With all that said, below is the content of the free text extension for Microsoft's Entity framework. I have tested it with the current EF version 5 release. You can download it from here

 IQueryableFreeTextExtensions.cs or read the text below

namespace BalsamicSolutions.Extensions

{

    /// <summary>

    /// IQueryableFreeTextExtensions, this extension provides composable

    /// access to FreeText (or Contains) searches of Entity Framework

    /// "Code First" objects . Assuming they are stored in in SQL Server tables

    /// that have FullText catalogs

    /// </summary>

    public static class IQueryableFreeTextExtensions

    {

        #region Entity Framework Utilities

        //we cache table names in a a dictionary because

        //they take a lot of work to figure out

        static readonly object _LockProxy = new object();

        static readonly Dictionary<Type, string> _TableNameCache = new Dictionary<Type, string>();

 

        /// <summary>

        /// Generate an Entity set for TEntity

        /// so that we can inspect the metadata

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="dbCtx"></param>

        /// <returns></returns>

        static EntitySet GetMetaDataEntitySet<TEntity>(DbContext dbCtx)

        {

            Type entityType = typeof(TEntity);

            string entityName = entityType.Name;

            MetadataWorkspace metaDataWS = ((IObjectContextAdapter)dbCtx).ObjectContext.MetadataWorkspace;

 

            //IEnumerable<EntitySet> entitySets;

            var entitySets = metaDataWS.GetItemCollection(DataSpace.SSpace)

                             .GetItems<EntityContainer>()

                             .Single()

                             .BaseEntitySets

                             .OfType<EntitySet>()

                             .Where(entitySet => !entitySet.MetadataProperties.Contains("Type")

                                                || entitySet.MetadataProperties["Type"].ToString() == "Tables");

 

            List<EntitySet> provisionedTables = entitySets.ToList();

            EntitySet returnValue = provisionedTables.FirstOrDefault(t => t.Name == entityName);

            //When an Entity inherits a base class, the corresponding

            //table is sometimes named for the base class

            while (null == returnValue && null != entityType)

            {

                entityType = entityType.BaseType;

                entityName = entityType.Name;

                returnValue = provisionedTables.FirstOrDefault(t => t.Name == entityName);

            }

            return returnValue;

        }

 

        /// <summary>

        /// Extract a named property value

        /// from the metadata properties

        /// </summary>

        /// <param name="entitySet"></param>

        /// <param name="propertyName"></param>

        /// <returns></returns>

        static string GetStringPropertyFromEntityMetaData(MetadataItem entitySet, string propertyName)

        {

            string returnValue = string.Empty;

            MetadataProperty metaProp;

            if (entitySet == null) throw new ArgumentNullException("entitySet");

            if (entitySet.MetadataProperties.TryGetValue(propertyName, false, out metaProp))

            {

                if (metaProp != null && metaProp.Value != null)

                {

                    returnValue = metaProp.Value as string;

                }

            }

            return returnValue;

        }

 

        /// <summary>

        /// Get the provisioned table name for a Class (TEntity)

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="dbCtx"></param>

        /// <returns></returns>

        static string GetProvisionedTableName<TEntity>(DbContext dbCtx, bool includeSchemaPrefix = false) where TEntity : class

        {

            Type tableAttribute = typeof(TableAttribute);

            TableAttribute tableNameAttribute = typeof(TEntity).GetCustomAttributes(tableAttribute, false).FirstOrDefault() as TableAttribute;

            if (null != tableNameAttribute)

            {

                //if its specified use that

                return tableNameAttribute.Name;

            }

            else

            {

                Type entityType = typeof(TEntity);

                string returnValue;

                lock (_LockProxy)

                {

                    //Its expensive to create sort out the object name so

                    //we will cache the name, it needs to be verified

                    //because for some reason the table names are not

                    //always aligned with the EntitySet name

                    if (!_TableNameCache.TryGetValue(entityType, out returnValue))

                    {

                        //Ok its not in the cache so try to figure it out

                        var entitySet = GetMetaDataEntitySet<TEntity>(dbCtx);

                        if (entitySet == null)

                        {

                            //This should be typed better, but for our example its a simple Exception

                            throw new ApplicationException("Unable to find entity set '{0}' in edm metadata" + entityType.Name);

                        }

                        returnValue = GetStringPropertyFromEntityMetaData(entitySet, "Schema") + "." + GetStringPropertyFromEntityMetaData(entitySet, "Table");

                        if (string.IsNullOrEmpty(returnValue)) throw new Exceptions.AttributeException("Could not find Entity Table name for " + entityType.Name);

                        _TableNameCache.Add(entityType, returnValue);

                    }

                }

                if (!includeSchemaPrefix)

                {

                    string[] nameParts = returnValue.Split('.');

                    returnValue = nameParts.Last();

                }

                return returnValue;

            }

        }

 

 

        /// <summary>

        /// Gets a column name suitable for naming constraints

        /// and indices, not actually validated with the Model

        /// since there dont seem to be any conflicts other

        /// than ComplexTypes which are simple to calculate

        /// </summary>

        /// <param name="columnProp"></param>

        /// <returns></returns>

        static string GetSimpleColumnName(System.Reflection.PropertyInfo columnProp)

        {

            Type columnAttribute = typeof(ColumnAttribute);

            ColumnAttribute columnNameAttribute = columnProp.GetCustomAttributes(columnAttribute, false).FirstOrDefault() as ColumnAttribute;

            if (null != columnNameAttribute && !string.IsNullOrEmpty(columnNameAttribute.Name))

            {

                return columnNameAttribute.Name;

            }

            else

            {

                return columnProp.Name;

            }

        }

 

        /// <summary>

        /// Gets the names of columns with the Key attribute

        /// and return their names and types

        /// </summary>

        /// <param name="tableType"></param>

        /// <returns></returns>

        static string[] GetKeyColumnNamesAndTypes(Type tableType, out Type[] columnTypes)

        {

            List<string> returnValue = new List<string>();

            List<Type> keyTypes = new List<Type>();

            var keyColumns = tableType.GetProperties().Where(prop => Attribute.IsDefined(prop, typeof(System.ComponentModel.DataAnnotations.KeyAttribute))).ToArray();

            foreach (PropertyInfo columnInfo in keyColumns)

            {

                returnValue.Add(GetSimpleColumnName(columnInfo));

                keyTypes.Add(columnInfo.PropertyType);

            }

            columnTypes = keyTypes.ToArray();

            return returnValue.ToArray();

        }

 

        /// <summary>

        /// convert generic object collection to a typed collection

        /// this only works with primitives or things with explicit

        /// conversion operators

        /// </summary>

        /// <typeparam name="T">type of objects in the untypedList</typeparam>

        /// <param name="untypedList">collection of untyped objects</param>

        /// <returns>Typed collection</returns>

        static HashSet<T> ConvertUntypedCollectionToTypedHashSet<T>(ArrayList untypedList)

        {

            HashSet<T> returnValue = new HashSet<T>();

            foreach (object untypedObject in untypedList)

            {

                returnValue.Add((T)untypedObject);

            }

            return returnValue;

        }

        #endregion

 

 

        #region FreeText operations

        /// <summary>IQueryableFreeTextExtensions.cs

        /// Build and execute a FreeText or Contains

        /// query 

        /// </summary>

        /// <param name="dbCtx">active DbContext</param>

        /// <param name="tableName">Table name</param>

        /// <param name="primaryKeyColumnName">Table primary key column name</param>

        /// <param name="searchText">FREETEXT or CONTAINS formated query text</param>

        /// <param name="columnNames">array of full text indexed column names, null or 0 length means use "*"</param>

        /// <param name="useContains">this is a Contains query</param>

        /// <returns></returns>

        static ArrayList ExecuteFreeTextSearch(DbContext dbCtx, string tableName, string primaryKeyColumnName, string searchText, string[] columnNames, bool useContains)

        {

            if (searchText.IsNullOrWhiteSpace()) throw new ArgumentNullException("searchText");

            string columnNameText = "*";

            if (null != columnNames && columnNames.Length > 0 && columnNames[0].Trim() != "*")

            {

                columnNameText = "(" + string.Join(",", columnNames) + ")";

            }

            ArrayList returnValue = new ArrayList();

            if (searchText.StartsWith("'")) searchText = searchText.Trim(new char[] { '\'' });

            string commandType = useContains ? "CONTAINS" : "FREETEXT";

            string sqlQuery = string.Format("SELECT {0} FROM {1} WHERE {2}({3},'{4}')", primaryKeyColumnName, tableName, commandType, columnNameText, searchText);

 

            dbCtx.Database.Connection.Open();

            try

            {

                //exeute the query

                using (DbCommand dbCmd = dbCtx.Database.Connection.CreateCommand())

                {

                    dbCmd.CommandText = sqlQuery;

                    dbCmd.CommandType = CommandType.Text;

                    using (DbDataReader dbRead = dbCmd.ExecuteReader())

                    {

                        while (dbRead.Read())

                        {

                            returnValue.Add(dbRead.GetValue(0));

                        }

                    }

                }

            }

            finally

            {

                //Dont dispose it, it will be needed again

                dbCtx.Database.Connection.Close();

 

            }

            return returnValue;

        }

 

        /// <summary>

        /// Internal processing of the free text query

        /// </summary>

        /// <typeparam name="TEntity">entity for our table</typeparam>

        /// <param name="thisQuery"></param>

        /// <param name="dbCtx">active DbContext</param>

        /// <param name="searchText">FREETEXT or CONTAINS formated query text</param>

        /// <param name="columnNames">array of full text indexed column names, null or 0 length means use "*"</param>

        /// <returns></returns>

        static IQueryable<TEntity> FreeTextExInternal<TEntity>(IQueryable<TEntity> thisQuery, DbContext dbCtx, string searchText, string[] columnNames, bool useContains) where TEntity : class

        {

            ///Assuming that the table has been preped for Full Text indexing we will execute

            ///a query for the primary key of matchign results, then we will convert our results

            ///to a typed collection and build an expression that will use the "Contains" method

            ///of the typed results to build this component of the expression tree. The

            ///remainder of the expression remains fully composable so things like Include and other

            ///join related queries will still work.

            ///

            /// Essentially mimicking something like this (assuming a guid based Key named UserId): 

            ///     HashSet<Guid> userMatchs = "SELECT UserId from Users WHERE ....... "

            ///     (from secUser in secCtx.Users.Where(usr => userMatchs.Contains(usr.UserId)) select secUser);

 

 

            Type[] keyColumnTypes = null;

            //Query the data model for the actual table name to query against

            string tableName = GetProvisionedTableName<TEntity>(dbCtx);

            //Query the entity object for the column name

            string[] entityKeyNames = GetKeyColumnNamesAndTypes(typeof(TEntity), out keyColumnTypes);

            if (entityKeyNames.Length == 0 || entityKeyNames.Length > 1)

            {

                //We need a single Key property , both for our Query and because its

                //a requirement of the FullText Catalog

 

                //This should be typed better, but for our example its a simple ApplicationException

                throw new ApplicationException(typeof(TEntity).Name + " as (dbo." + tableName + ") does not have a a valid index to support a full-text search key.");

            }

            //Ok get the matching TEntity parameter

            ParameterExpression parameterExpression = Expression.Parameter(thisQuery.ElementType, "p");

            //And now project the propery name on TEntity (we know its only one level deep so no need to evaluate dotted naming)

            MemberExpression memberPropertyAccess = MemberExpression.Property(parameterExpression, entityKeyNames[0]);

 

            //Now get the untyped collection of results from the query

            ArrayList untypedResults = ExecuteFreeTextSearch(dbCtx, tableName, entityKeyNames[0], searchText, columnNames, useContains);

 

            // Convert the results to a typed collection so that the comparisonCondition can be correctly constructed , if we dont

            // we will get errors like this (assuming a key of type guid and an untyped array)

            //  " generic type 'System.Guid' cannot be used for parameter of type 'System.Object' of method 'Boolean Contains(System.Object)'"

            Expression comparisonCondition = null;

            LambdaExpression lambdaExpression = null;

            MethodInfo containsMethod = null;

            switch (keyColumnTypes[0].Name)

            {

                case "Guid":

                    {

                        HashSet<Guid> containedIn = ConvertUntypedCollectionToTypedHashSet<Guid>(untypedResults);

                        containsMethod = typeof(HashSet<Guid>).GetMethod("Contains", new Type[] { typeof(Guid) });

                        comparisonCondition = Expression.Call(Expression.Constant(containedIn), containsMethod, memberPropertyAccess);

                        lambdaExpression = Expression.Lambda(comparisonCondition, parameterExpression);

                    }

                    break;

                case "String":

                    {

                        HashSet<string> containedIn = ConvertUntypedCollectionToTypedHashSet<string>(untypedResults);

                        containsMethod = typeof(HashSet<string>).GetMethod("Contains", new Type[] { typeof(string) });

                        comparisonCondition = Expression.Call(Expression.Constant(containedIn), containsMethod, memberPropertyAccess);

                        lambdaExpression = Expression.Lambda(comparisonCondition, parameterExpression);

                    }

                    break;

                case "Int32":

                    {

                        HashSet<int> containedIn = ConvertUntypedCollectionToTypedHashSet<int>(untypedResults);

                        containsMethod = typeof(HashSet<int>).GetMethod("Contains", new Type[] { typeof(int) });

                        comparisonCondition = Expression.Call(Expression.Constant(containedIn), containsMethod, memberPropertyAccess);

                        lambdaExpression = Expression.Lambda(comparisonCondition, parameterExpression);

                    }

                    break;

                case "Int64":

                    {

                        HashSet<Int64> containedIn = ConvertUntypedCollectionToTypedHashSet<Int64>(untypedResults);

                        containsMethod = typeof(HashSet<Int64>).GetMethod("Contains", new Type[] { typeof(Int64) });

                        comparisonCondition = Expression.Call(Expression.Constant(containedIn), containsMethod, memberPropertyAccess);

                        lambdaExpression = Expression.Lambda(comparisonCondition, parameterExpression);

                    }

                    break;

                case "Int16":

                    {

                        HashSet<Int16> containedIn = ConvertUntypedCollectionToTypedHashSet<Int16>(untypedResults);

                        containsMethod = typeof(HashSet<Int16>).GetMethod("Contains", new Type[] { typeof(Int16) });

                        comparisonCondition = Expression.Call(Expression.Constant(containedIn), containsMethod, memberPropertyAccess);

                        lambdaExpression = Expression.Lambda(comparisonCondition, parameterExpression);

                    }

                    break;

                default:

                    throw new Exceptions.QueryException("unsupported primary key type " + keyColumnTypes[0].Name + " for " + tableName);

                    break;

            }

 

            //and now complete our composition by adding the Where "our free text result keys contains(...)" clause

            MethodCallExpression conditionResult = Expression.Call(typeof(Queryable), "Where", new[] { thisQuery.ElementType }, thisQuery.Expression, lambdaExpression);

            return thisQuery.Provider.CreateQuery<TEntity>(conditionResult);

 

        }

        #endregion

 

        #region public FreeTextSearch extension

        /// <summary>

        /// FreeTextSearch Executes a free text search (using FREETEXT) for Entities that were processed with the

        /// SqlFullTextIndexAttribute (or are just in a free text catalog). It queries the FreeText catalog for the primary keys

        /// of the TEntity objects that are in scope for the query, then creates an expression

        /// tree to include those entities.The DbContext is used to execute these initial

        /// queries

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="thisQuery"></param>

        /// <param name="dbCtx">DbContext attached to TEntity</param>

        /// <param name="containsSearchText">FREETEXT formated search expresssion</param>

        /// <returns>IQueryable<TEntity></returns>

        public static IQueryable<TEntity> FreeTextSearch<TEntity>(this IQueryable<TEntity> thisQuery, DbContext dbCtx, string freetextSearchText) where TEntity : class

        {

            return FreeTextExInternal<TEntity>(thisQuery, dbCtx, freetextSearchText, null, false);

        }

 

        /// <summary>

        /// FreeTextSearch Executes a free text search (using FREETEXT) for Entities that were processed with the

        /// SqlFullTextIndexAttribute (or are just in a free text catalog). It queries the FreeText catalog for the primary keys

        /// of the TEntity objects that are in scope for the query, then creates an expression

        /// tree to include those entities.The DbContext is used to execute these initial

        /// queries

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="thisQuery"></param>

        /// <param name="dbCtx">DbContext attached to TEntity</param>

        /// <param name="containsSearchText">FREETEXT formated search expresssion</param>

        /// <param name="columnNames">array of full text indexed column names, null or 0 length means use "*"</param>

        /// <returns>IQueryable<TEntity></returns>

        public static IQueryable<TEntity> FreeTextSearch<TEntity>(this IQueryable<TEntity> thisQuery, DbContext dbCtx, string freetextSearchText, string[] columnNames) where TEntity : class

        {

            return FreeTextExInternal<TEntity>(thisQuery, dbCtx, freetextSearchText, columnNames, false);

        }

        #endregion

 

        #region public FreeTextContains extension

        /// <summary>

        /// FreeTextContains Executes a free text search (using CONTAINS) for Entities that were processed with the

        /// SqlFullTextIndexAttribute (or are just in a free text catalog). It queries the FreeText catalog for the primary keys

        /// of the TEntity objects that are in scope for the query, then creates an expression

        /// tree to include those entities. TThe DbContext is used to execute these initial

        /// queries

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="thisQuery"></param>

        /// <param name="dbCtx">DbContext attached to TEntity</param>

        /// <param name="containsSearchText">CONTAINS formated search expresssion</param>

        /// <returns>IQueryable<TEntity></returns>

        public static IQueryable<TEntity> FreeTextContains<TEntity>(this IQueryable<TEntity> thisQuery, DbContext dbCtx, string containsSearchText) where TEntity : class

        {

            return FreeTextExInternal<TEntity>(thisQuery, dbCtx, containsSearchText, null, true);

        }

 

 

        /// <summary>

        /// FreeTextContains Executes a free text search (using CONTAINS) for Entities that were processed with the

        /// SqlFullTextIndexAttribute (or are just in a free text catalog). It queries the FreeText catalog for the primary keys

        /// of the TEntity objects that are in scope for the query, then creates an expression

        /// tree to include those entities. The DbContext is used to execute these initial

        /// queries

        /// </summary>

        /// <typeparam name="TEntity"></typeparam>

        /// <param name="thisQuery"></param>

        /// <param name="dbCtx">DbContext attached to TEntity</param>

        /// <param name="containsSearchText">CONTAINS formated search expresssion</param>

        /// <param name="columnNames">array of full text indexed column names, null or 0 length means use "*"</param>

        /// <returns>IQueryable<TEntity></returns>

        public static IQueryable<TEntity> FreeTextContains<TEntity>(this IQueryable<TEntity> thisQuery, DbContext dbCtx, string containsSearchText, string[] columnNames) where TEntity : class

        {

            return FreeTextExInternal<TEntity>(thisQuery, dbCtx, containsSearchText, columnNames, true);

        }

        #endregion

    }

}