Federated Search: The Options
KEY CHALLENGES WITH FEDERATED SEARCH
- There are two distinct definitions of "federated search" within the enterprise search industry. They imply very different approaches, both conceptually and technically.
- It is important to understand which of these approaches is best suited to your needs.
We will define and explain the two alternative federated search models. In many enterprise client projects, we have leveraged these federated search approaches in many open source and commercial search engine implementations, including SharePoint, Elasticsearch, Solr, etc.
"Federated Search" is generally accepted to mean:
Deploying a search over distributed and possibly heterogeneous data sets, and receiving in return a unified search results list.
There are two distinct approaches to federated search, which can be labeled as index-time merging and query-time merging. The pros and cons of these are outlined below.
In most circumstances, this is the faster and easier solution to implement.
- A query federator intercepts the query, and passes it to multiple search engines
- The federator then waits to hear replies from the search engines, and when received, merges or concatenates the results into a results list.
This model relies on data repositories to provide a search function.
Pros: The primary advantage of this approach is ease of implementation because no additional indexing of content is necessary. The query federation system simply taps into existing systems and extracts results, which are then merged.
In some cases, query-based federation is the only viable option. For example:
- Federating to large-scale Web content via a major search engine such as Google
- Federating to a private data set, held behind a pay-wall and therefore not available to be indexed locally
- Performance issues can occur if the federator waits for the slowest remote search engine to respond
- The merging of search results into a sensible hit list is difficult if based on relevancy, as each search engine called will score relevancy in a different way. Often, is it better not to attempt to merge on relevancy but instead; either present separate results lists (behind tabs for example) or use a more deterministic data item to merge on, such as date, location or price, or present results from different sources in blocks. For example, this is how SharePoint 2013 Search Federation works
- Search engines provide varying levels of query sophistication. Federation at query time usually implies a "dumbing down" to suit the least capable search engine, however, this need not always be the case. For example, sophisticated query parsers can be used to ensure that search clues are optimized for each search engine involved
- Document-level security is a potential cause of performance issues, but this depends on the complexity of the security environment
This approach requires content to be acquired into a central index, and it is typical of traditional enterprise search systems.
- Most search engines default to ranking by relevancy, which is what most users expect. Through acquiring all data into a central index, sophisticated query enhancement and relevancy algorithms can be applied, providing the user with excellent search results.
- The effort needed to acquire the content from the various repositories can be substantial. This is done via read-only processes. The content of remote repositories is not moved or changed, but the indexing process must read each item, and re-read it every time a change occurs. In some cases, for example, where private content behind a paywall is involved, this is not possible
HYBRID FEDERATED SEARCH
Sometimes, the optimum solution is a hybrid approach. Where practical, content is indexed centrally. Repositories for which that is not cost effective (or simply not possible) are federated to at query time. If this approach is used, careful thought is needed about results presentation, to make sure that users understand how the system is set up, and how to navigate and interpret results efficiently.
WHICH APPROACH WORKS BEST?
The approach that works best all depends on your data environment and your user needs.
Start by looking at the data environment, user requirements, and business drivers, then informed decisions can be taken. In our engagements, this process usually begins with a Search Assessment.
To discuss federated search in more detail, please feel free to contact us.