What is your Search Maturity Level?
At Search Technologies we talk a lot about the concept of “Search Maturity Level”. We find this to be a convenient way to classify the sophistication of a search implementation, and that in turn helps clarify what steps it would take to move up from one level to the next.
It is clear that different search implementations are at different maturity levels. For some organizations a simple “check the box” search engine that reliably returns results is more than enough. Others need a more sophisticated search application, especially where search is an integral part of an important business process.
Search maturity levels range from Level 0 (search engine barely works) to Level 10 (sophisticated and audited procedures for continuous improvement implemented). Few search applications require the highest level of maturity; many will only need to reach Level 3.
So what are the different search maturity levels? They are defined as follows.
Level 0: Unstable black box – results are known to be poor
If you have merely purchased or downloaded a search engine and have installed it, chances are you will be at Level 0. The world of text search is too wide and varied for any engine to work correctly out-of-the-box.
How to move up to level 1: Study and understand your search engine from top to bottom. Configure it properly for your application.
Level 1: Stable, returns reasonable results – index completeness is unknown
At Level 1 your engine is working and returning reasonable results. This level is sufficient for many applications where search is a “check the box” feature (i.e. not a primary component of the application).
However, the engine is indexing documents on its own and there is no way to determine if all documents are properly indexed. This is a common problem with search engine installations. Other than a count of all documents, few implementations have thorough methods for auditing the contents of indexes to determine completeness.
How to move up to Level 2: Implement an index auditing procedure. This can be done using a statistical procedure (random sampling) or with a thorough “check every last document” test.
Level 2: Index auditing in place
Your search engine is working acceptably and you are auditing the indexes. You are confident that all of your documents are actually being indexed and can be retrieved. However, you wish your search engine had some of those fancy features that everyone is talking about – like navigators.
How to move up to Level 3: Implement some application-appropriate features and make sure they’re working properly.
Level 3: Sophisticated features working and fully audited
Congratulations! Level 3 is a good level for many search implementations.
At this level you’ve likely implemented some of the more fun user-interface features such as, navigators based on easily acquired or pre-existing metadata, highlighted teasers, highlighted documents, a customized advanced search page, search results grouping, “did you mean” spell checking, and application-specific search results presentation.
Level 3 is where you have started to customize your search engine results for your specific application. Users recognize that your search engine is now clearly better for your data than a standard web site search (in other words, your search is now better for your data than simply using www.google.com).
How to move up to Level 4: Provide power users quick access to targeted subsets with better document metadata and a custom query parser.
Level 4: Metadata and query parsing for power users
Now it’s time to get serious. Perhaps you have a demanding customer base. Perhaps you wish to make absolutely clear how your engine and your data can solve the most difficult queries that your users can dream up.
When this is the case, you will likely need to upgrade your document processing and metadata extraction, perhaps merging in metadata from multiple sources. In addition, you may need to provide a query interface which makes it easy for power users to enter more sophisticated searches that will yield precise results.
Getting to Level 4 and staying there can be tough. The additional metadata you need will require you to try harder. You’ll benefit hugely from implementing a document processing framework to control the process of document handling and data stream merging. You will most likely use external resources such as authority files to enrich metadata. Random collections of shell scripts and Perl programs will no longer be sufficient as they’ll pull you slowly into chaos.
And then, with the additional metadata available, you will compliment it with a query parsing framework which enables all of your users to automatically build and execute sophisticated queries. Your power users will now revel in the rich metadata environment and quickly learn to target subsets of documents that exactly meet their needs.
Okay, that was a difficult step. Now what’s next?
How to move up to Level 5: Implement a process for manual tuning of search results to handle exception cases.
Level 5: Manual tuning of search results to handle exception cases
Search engines are not perfect, and no relevancy algorithm will provide the best possible results in all cases. This is especially the case for generic or “starter” queries which new users will typically send to your search engine. Often the terms in these simple queries will be so frequent that they are found in most documents, and so despite all of your efforts until now, your search engine doesn’t always produce useful results.
Most search engines have methods for fine-tuning search results to handle these exception cases. This can be done for example with query expansion via synonyms, or if necessary, by hard-coding search results.
But it is not enough to simply start using these methods. In order to be at Level 5, you will need to implement a process for implementing these methods. This usually means a periodic evaluation of your query logs to identify the most frequent queries entered by users, performing an analysis of the results produced by these queries, and then making the appropriate adjustments.
How to move up to level 6: Start your journey into the world of search engine relevancy ranking with document-based tuning.
Level 6: Document-based relevancy tuning and analysis
Not all documents are created equal. Some are clearly more useful than others.
The clearest demonstration of Level 6 is the problem of spam. All large-scale web search engines must deal with spam, which means that these engines must determine which documents are more useful than others. Google’s PageRank and similar algorithms provide a good foundation for this out on the public Web.
To achieve Level 6, your system will need to incorporate some degree of document-based relevancy tuning. In the absence of a massively hyperlinked document structure (a luxury very few search applications can rely on) this is usually done by computing a document’s query-independent relevancy based on its source or some other combination of metadata. This relevancy adjustment is then provided to the search engine.
Document-based relevancy adjustments often solve problems where certain types of documents frequently dominate search results. For example, you may find that your results are always cluttered by news items, causing important help or product pages to be missed. In such a case you will want to implement document-based relevancy tuning.
How to move up to level 7: Implement a statistically rigorous approach to search results evaluation.
Level 7: Statistical analysis of search results
Relevancy ranking of search results is, by its very nature, a fuzzy and inexact process. Documents are compared to the user’s query terms and are rated based on how well they meet the user’s search criteria.
The only way to properly tune such a system is with a statistically rigorous method for search results evaluation. This will require a process for evaluating the success rate for searches– either by analyzing usage logs or through some method of generating and capturing relevancy judgments.
Once such raw materials are available, a thorough statistical analysis can be performed to show just how well the search engine is performing in solving users’ queries.
How to move up to Level 8: Take the results of your analysis and do something with it.
Level 8: Continuous improvement processes implemented
If a process is in place which leverages your statistical results analysis to create on-going, iterative improvements to your search system, then Congratulations! You have achieved Level 8.
Getting to Level 8 is typically only for search applications which provide a critical component to a business process. It requires deep and on-going expertise in search and statistical analysis, plus the corporate motivation to implement the appropriate processes, drive them through to completion, and continue to maintain them over time.. Usually this happens only where the business benefits are obvious and measurable. Often, this applies to companies whose business is their content.
How to move up to Level 9: Implement some high-level features beyond just search and results.
Level 9: Recommendations, popularity, clustering, and automatic categorization
We are now in rarefied air. Search by itself is no longer good enough for you. Your search engine is already is as good as it can be and is producing excellent results. It’s time to move to Level 9.
Closely related to search are extensions for recommendations, popularity, clustering and automatic categorization. Such extensions are heavily based on statistical analysis of documents, words, term distributions, and usage. All of these techniques will need the same data and language processing that is required of your high-quality search engine. Indeed, for most of these advanced techniques to work well, clean, well structured input data is critical.
In some search applications, these Level 9 extensions have been mastered and are driving businesses forward. E-commerce search applications lead the way, driven by their crystal-clear ROI. Yet despite most of these techniques having been around for a decade or more, the proportion of organizations who use them well is surprisingly small.
How to complete your journey by reaching Level 10: Ask for your processes to be audited by a third party.
Level 10: All of the above audited and certified by a third-party
Reaching Level 10 means that your search engine implementation and continuous improvement processes have been audited by a third party who is well versed in the worlds of search engines and statistical analysis. Your processes are sound, your statistics are valid, and your procedures are carefully written and scrupulously followed.
Call it a quality standard for search implementations.