Back to top

Solr and PowerShell: Harvesting Query Suggestions from Facets

This blog post combines several themes:

  • For search implementers working on Windows, PowerShell is a wonderful tool for automation
  • Query suggestions are powerful for improving the search experience
  • Search facets are a great source of query suggestions


PowerShell PoseurShell 

PowerShell may seem like a poseur to *nix users long accustomed to the power of a shell for doing automation, and PowerShell may seem bizarre to Windows users long accustomed to doing automation using programming languages with heavy IDEs like VB. But to me, PowerShell is the natural evolution of the shell into the modern object-oriented and dynamically-typed world. PowerShell brings together the best of two worlds: lightweight scripting environment and dynamically typed objects. 

Anyone unconvinced of the goodness of PowerShell may want to give a try to some of the books that introduce PowerShell using this kind of holistic thinking, my favorite is Pro Windows PowerShell by Hristo Deshev. 

The Power of Suggestion 

Philosophy aside for now, Search Technologies recently had an engagement with a customer to explore the power and flexibility of query suggestions for improving the search experience. The query suggestion functionality is also known as query completion or type-ahead, and it typically involves a menu of choices that pop up from the search box as the user is typing search terms. The power of query suggestions lies in the way it addresses some fundamental challenges of the search implementer:

  • Longer queries are better for deducing what the user is searching for, but users want to type less
  • Spelling counts, and users are not good spellers
  • Many times, the user knows what he's searching for, but does not know the appropriate search terms to find it


Query suggestions draw the user into spending more time on the initial query, and they make it easier to enter long queries. If a user selects a particularly long query suggestion, and finds that the results do not include what he's looking for, the results can be expanded by removing terms from the query, a process which is slightly easier for the user than the inverse of adding query terms to reduce the result set. 

Query suggestions make it easy for the user to try several different but similar queries, something users do when they're fishing for better search terms. 

Query suggestions naturally give the user confidence in the search process, and a feeling of satisfaction when the user's search is suggested. I remember a teacher who gave a gold star whenever a student asked a question that was about to be answered in the prepared lecture, and when a user's search is suggested, it gives a similar reward, as if to say, "well done, you are smart to be searching on this." 

Finally, query suggestions can serve to solve relevancy problems that are difficult to solve in the organic search system. To give an example, imagine an international company has different holiday calendars for different locations, and you want a user searching for "holiday calendar" to receive the calendar appropriate for his location as the first result. Manipulating the organic search system to add the necessary boosting for this particular query may be unwieldy, but it is a simple matter to have client side code (javascript) which adds the user's location to the query. 

Query suggestions are flexible in more ways than one. Query suggestions are powered by simple data structures on the server, and are therefore easier to tune and manage than the main search index, and that is one level of flexibility. The query suggestion machinery is also largely decoupled from the search engine, which means, even if a particular search vendor provides a query suggestion feature (typical), you can easily replace the vendor's suggestion technology with another technology, and the machinery is simple enough that you can do this at home with standard household materials. Others on the internet have given recipes, and at Search Technologies we have a ready-made query suggestion system that uses a Solr index on the server (for performance), javascript on the client, and a little AJAX. 

Let me now elaborate on the idea of adding user-specific terms to the query via javascript. If you arrange it so that suggestions are returned from the server as JSON objects, you can add members to those JSON objects. You can add a boolean member which indicates whether the suggestion is locale sensitive. When the "holiday schedule" suggestion comes back from the server, the locale sensitive flag is set to true, and after checking that flag, the javascript on the client adds the member's locale to the suggestion. 

Harvesting Suggestions 

Now I come to the practical portion of this post. I mentioned we recently demonstrated query suggestions for a customer, and for this demonstration, we did not have the time or means to hand-build a useful set of suggestions, so how could we quickly create a set of suggestions that were meaningful in this customer's domain, and also sure to produce search results? After all, a query suggestion that returns zero hits is a very poor suggestion indeed. Here is a riddle: I am specific and descriptive and if you search for me you will always get results, and most documents in the index can be found by me, what am I? In the world of Solr, I am a facet. Facets are built from metadata in the index, so if you answered metadata, you were also correct, but our riddle has this one extra twist: with a little effort, you can harvest me automatically. 

The algorithm for harvesting facet values is simple and time-tested:

  1. With an automated tool, make a wildcard search that returns as many results as possible (all documents in the index is ideal)
  2. The page size of the wildcard query doesn't matter, as we are not interested in the results, but what does matter is the maximum number of facet values that are returned: set this to a huge number
  3. Pull out the facet values from the returned data


With Solr, conveniently there is a wildcard query that will return all documents in the index, *:* . And we could make an automated harvesting tool with Java or C#, but after using PowerShell to automate queries against FAST Search Server for SharePoint 2010, I had a taste for doing this kind of quick and dirty automation in PowerShell. Surprisingly, my internet search for methods of calling Solr from PowerShell turned up nada, so I rolled up my sleeves and made a PowerShell cmdlet for calling Solr. Here is how. 

About SolrNet 

SolrNet is the de facto standard library for calling Solr from .Net code, and in keeping with the odd mashup that is Solr + .Net, the SolrNet project is an odd mixture of worlds: Its homesite is on code.google.com, and the inventor and lead developer Mauricio Scheffer maintains an active presence in the associated Google Group discussion forum, but the code is maintained on github. And recent overtures to bring the project under the Apache umbrella were rebuffed by the pragmatic SolrNet community. 

Cliffhanger 

Here's an example of how we want our PowerShell cmdlet to work:

Add-PSSnapin SolrNetPSSnapIn -erroraction SilentlyContinue 
Get-Facets -FacetFields "cat,manu_exact" | Sort -Property Value -Unique


The SolrNetPSSnapIn snapin is required to make our cmdlet available, and Get-Facets invokes the cmdlet, passing an argument called FacetFields which is a list of facets to get. "cat" and "manu_exact" happen to be interesting fields for faceting in the out-of-the-box Solr schema. You can imagine the kinds of query suggestions you would get from this harvesting: product categories and manufacturer names. 

If you call Get-Facets with more than one facet field name, as in this example, then all the values get combined into one list of FacetOutput objects, and the "Category" property tells you the facet field from which each value came. The third property, "Weight," contains the hitcount of that particular facet value, and is useful when building the query suggestion service, to use as a relative weight for sorting the suggestions.

public class FacetOutput {
public string Category { get; set; }
public string Value { get; set; }
public int Weight { get; set; }
}


In the next post I will walk through the code for the PowerShell cmdlet and snapin. 

Written by Matt Snyder

0