What Does "Relevant" Mean?
Relevancy ranking is the process of sorting the document results so that those documents which are most likely to be relevant to your query are shown at the top.
But what does it mean to be "relevant"?
There are two schools of thought as to what makes a document relevant:
Hard Core Definition: The document actually answers the question or solves the problem which caused you to perform the search to begin with.
This is more of a magical black-box approach to relevancy, as in "it worked, and I don't care how".
User Understandable Definition: It is readily apparent to the end-user why the search engine retrieved this document.
This is more of a practical clear-box approach, where the user has visibility into how the search engine does its job, as in "it makes sense why this document was retrieved by the engine".
Speaking generally, the Hard Core definition typically requires more processing power and complexity and takes longer to provide the result – which is usually much more accurate when properly tuned. The User Understandable definition is faster with weaker results, but since the user can understand it, they can modify and re-execute their query many times over to quickly get the results they want
Early on, search engines (such as Dialog, Lexis-Nexis, Westlaw, and Fulcrum) were more thought of as query tools – executing queries over text documents just like SQL relational database queries. This is the "User Understandable Definition" of relevancy, as in: "Did the engine correctly execute my query?"
Someone once told me that "If I search for 'A AND B' and the document actually contains both of the terms A and B, then it is a relevant document." While this is a very limited view of relevancy, it does have the advantage of being completely unambiguous.
However, these early query tools required user training and considerable manual intervention to solve problems and achieve complete results. Universities still teach classes on using the Dialog search engine which is still heavily used in academic and legal environments.
Second Generation Engines
Sometime in the early 1990's, thanks to commercial engines like Personal Librarian and RetrievalWare and academic engines from Cornell and Carnegie Mellon, the notion of what is relevant shifted to a more hard core definition, as in: "Does the document actually answer my question or solve my problem?"
This definition took into account the fuzzy and ambiguous nature of human language. It also asserted that the engine could do a better job of crafting a query than a naïve end-user. These engines began to compute a "relevancy score" which attempted to determine just how relevant the document was to the query – in relation to all other documents in the database. Documents were then sorted by this score.
Using the Hard Core Definition actually opened up a wonderful freedom for search engines to experiment with all kinds of different algorithms for determining relevancy. This was encouraged by the Text REtrieval Conference (TREC, see http://trec.nist.gov/) which adhered strictly to a "Hard Core" definition when judging whether or not a document was relevant to a query.
But often the statistical/experimental engines went too far. As an example, some engines would find frequent terms in the top 100 documents of a search, and then automatically submit a second search using these supposedly more useful terms. Or, they would look up the user's terms in a semantic network and then automatically expand the user's query to include dozens of additional related terms.
These early statistical engines would often return better results, when judged strictly by a hard-core relevancy definition for many generic queries. But they had two primary flaws:
- When they failed, they failed spectacularly – providing very strange results that were impossible for the lay user to understand (for example, sometimes they would retrieve documents which contained none of the user's original search terms)
- They tended to be more appropriate for research or exploration types of queries than for "lookup" or "navigation" queries
Web Search Engines
The invention of the World Wide Web dramatically changed the paradigm in several ways. First, Web search engines were concerned with vastly larger databases than had been previously envisioned. For the technology of 1990's, this meant putting an extremely high premium on performance which in turn meant dramatically simplifying queries and providing quick estimated results where possible.
This emphasis on performance was further heightened by the new Internet user community, namely the entire world – a community vastly larger than intranet or subscription-based user populations had ever been. Queries had to be returned within a fraction of a second, rather than the 5-10 seconds which had previously been acceptable. The overriding concern suddenly became: "Will my server crash?" and issues such as relevancy became much less important.
But more important than the performance question was the shift in the primary purpose of search. Instead of general research queries, most users were interested in simple navigation. Queries such as "take me to the web site for the FIFA World Cup" became the norm, rather than "find all research related to breast cancer and BRCA2."
This shift in purpose drove a dramatic narrowing of search relevancy. No longer was the engine concerned with finding the 1000 documents which together provide a complete representation of the user's interest. Instead, the task became navigating the user to the one document they seek.
All of this drove web search engines back towards the "User Understandable Definition" of relevancy. As long as the results "made sense," then the engine was performing properly. More exotic statistical techniques involving aggressive query expansions pretty much disappeared and were looked upon as curious and misguided anachronisms.
Expanding the User Understandable Definition
The User Understandable Definition of relevancy can be expanded if one can think of clever ways to communicate to the user why a document was retrieved.
The simplest method for this is to show highlighted teasers or dynamic page summaries. This technique displays snippets of the source document in the search results with query terms (or expansion terms) highlighted. If the user can quickly see exactly why the document is retrieved (because, for example, both of the user's terms occur close together within a sentence of the document), this makes it more apparent how the engine is working and provides the user with a greater level of comfort that the engine is behaving rationally.
Second, the user query terms can be highlighted within the document itself. Naturally this is a tricky proposition in the wild and wooly world of the World Wide Web, but it can be done and can help the user to determine exactly why a document was retrieved.
Finally, additional properties or explanations can be provided to the expert user who cares. For example, when you click on the "cached" link in Google, you may get a message such as:
These search terms are highlighted: wood cargo
These terms only appear in links pointing to this page: ships
Notice how it clearly identifies why a document was retrieved, even though it was missing one of the required search terms in the text of the document itself.
Other search engines will provide explanations such as:
- How and why synonyms were added to the query
- How and why spelling variations are added to the query (the "Did you mean?" feature), and
- Full detailed explanations of how the relevancy score was computed
Spam and Source Ranking
Of course, about 30 minutes after the first web search engine went live, web sites began spamming the engine.
Spam is a big deal because now you can no longer depend entirely on the content of the document to determine if it is relevant or not. Spam documents look exactly like good documents – and may even be entire copies of them. There is nothing to prevent a spammer from copying a Wikipedia page into their web site – but clearly the user would much rather go to Wikipedia directly, rather than some commercial copy of it.
It was spam that caused the third big shift in search engines, from the early web search engines such as Excite, Lycos, and Alta Vista to search engines built from the ground up with the presumption of spam, such as HotBot, GoTo, Google, Yahoo! Search (the 2004 version) and Bing.
All of these engines now began to determine relevancy not just based on content match, but also based on external criteria. For example, HotBot was the first to judge documents based on user popularity. Documents most often selected from the results list became more relevant, even if other documents may have been a better match to the user's query. Google bases a document's relevancy on the number and authority level of external sites which link to it, what words are used to describe the document by others, and editorial decisions based on the usefulness of the content source. GoTo.com (now part of Yahoo Marketing) ranked documents based on how much someone was willing to pay using a bidding system.
Essentially, spammers forced web search engines to evaluate the trustworthiness and popularity of each document and content source, and to incorporate that into the notion of relevancy. The Google PageRank is a graphical representation of this idea.
Where are we Now?
And so the pendulum which started with user understandable, then swung to hard core, then back, is now hovering somewhere in the middle.
It's clear that there's new recognition of the user's role in determining relevancy. After all, if the user doesn't feel that the document is relevant, then it doesn't matter how well the engine is performing. Results must make sense or be explainable in some way.
But now we're also starting to see users who are more sophisticated about their search results. Product managers are asking: "how do I get better relevancy for my customers?" These sorts of questions can have substantial dollar impacts on a company's bottom line. Users are starting to understand metrics like term frequency, proximity, fielded subsets, etc., and are starting to build more sophisticated mental models for how search engines perform relevancy scoring and what it means to be "relevant."
And more and more hard-core techniques are starting to filter in. More often search engines are expanding the user's terms to include acronym expansions, various spelling differences, even synonyms and other strongly related terms. Hard-core evaluations such as were performed during the TREC conferences are now starting to be implemented by companies who want to create the "optimal relevancy algorithm" for their customers.
And the Future...
It's interesting to speculate about where relevancy is headed. From my perspective, I see two major changes in how the definition of what is "relevant" may change.
First is a move towards more transparency.
Remember that the more a user understands how a search engine is behaving, the better they will be at manipulating the results and the more comfortable they will become that the engine "knows what it is doing."
And so there's a general move towards transparency. The more developers there are engaged in creating and implementing search engines, the more that relevancy factors will become common knowledge and common conversation. Further, the more that relevancy factors are made apparent to end-users, the more everyone will understand how engines, perhaps subconsciously, are operating.
Second is a general recognition that relevancy, like beauty, is "in the eye of the beholder."
Basically, this means recognizing that relevancy needs to be malleable based on the user community, available document data, and the goals of the individual. Relevancy will mean something completely different to a librarian, a casual shopper, or a patent attorney. Sites will become increasingly targeted in how they determine what types of documents are relevant to what types of user requests.
This means that someday I expect the question "is that relevant?" to be recognized as roughly the same type of question as "is that picture beautiful?" or "does this wine taste good?"
Ultimately, the answer to the question "what does relevant mean" is: "it depends on the user."