FAST ESP to Solr Migration Case Study
Reed Business Information (RBI), a leading provider of business information, data and marketing solutions, produces industry critical data services and lead generation tools, as well as online community and job websites. RBI reached out to Search Technologies for help with a migration project, to take them from FAST ESP, to a Solr-based infrastructure running on Amazon Web Services (AWS) EC2 instances.
RBI has been using FAST to power search on a wide range of websites since 2005. FAST ESP has proved to be a highly reliable platform for this purpose. In addition to very low down-time, it has provided large query capacities, and has been used as a platform to develop or deploy a range of sophisticated, multi-lingual capabilities for entity extraction and categorization, powering both search, and the contextual serving of widgetized links across hundreds of websites.
The future for the FAST technology, owned by Microsoft since 2008, is SharePoint-centric, and as an agile publisher of both business and consumer titles to niche audiences, RBI decided that Solr was their preferred alternative to replace FAST ESP. RBI contracted with Search Technologies to deliver consulting and implementation services, and to manage the FAST ESP to Solr transition project.
The overall project objective was simply to emulate the FAST-based service, without loss of functionality or performance.
The websites served by this application use a range of languages, including Chinese (simple and traditional), English, Dutch, Spanish, French, Italian and German.
The content sets to be indexed include the participating websites, to enable sophisticated site search functionality, plus numerous other content sources, to provide supplementary information and news, usually focused around specific industry verticals. The new system also powers RBI’s business search portal, Zibb.com.
SAME FUNCTIONALITY, MUCH LOWER COSTS
The key challenge set by RBI, was to find a way to maintain existing functionality, providing a highly functional and reliable service to publishers within RBI, and at the same time to substantially lower the overall cost-of-ownership of the search infrastructure.
Some key aspects of the existing infrastructure used FAST-specific methods. In addition, FAST was running on a substantial, Microsoft-hosted facility involving more than 90 servers.
A key aspect of the requirements was that publications using the search service should not be required to change anything in their configuration. This necessitated emulating a number of FAST ESP methods, including:
- Transforming FAST FQL search requests into Solr’s query syntax
- Returning results in standard FAST ESP format (by manipulating Solr’s XML-based results)
- Making use of existing content processing capabilities, such as entity extraction, and categorization
Over a period of a few months, with regular daily calls between the RBI and Search Technologies’ teams, the project was detailed and progressed. Key decisions included:
- The use of Amazon AWS to host the new search service
- The development of a query parser to translate FAST FQL into Solr search syntax, and to translate Solr search results into a FAST ESP format, so that the receiving Content Management Systems would not notice any changes and would function as normal
- The use of software-based load balancing to send queries to servers with spare capacity
Solr’s native language processing capabilities coped adequately with the multi-language demands of this project.
Graeme McCracken, CIO at Reed Business International, commented, “We progressively transitioned our sites and services from FAST to Solr over a two-week period, and nobody noticed. At the same time, we reduced our on-going cost-of-ownership by more than half.”
Numerous Reed Business properties are now served by this Solr-based service. Design specifications called for an average search-time of less than 200 milliseconds. The live system is consistently delivering an average of 70 milliseconds.
The new Solr search system has more than 30 million documents under index, and it meets sustained capacity demands of more than 300 queries-per-second, without compromising search speed.