Back to top

Chegg Increased Sales with Enhanced E-Commerce Site Search

Search Engine Query and Relevancy Analysis in E-Commerce

Designing, implementing and maintaining enterprise search is a journey, not a one-time event. Whether you are doing an initial design or seeking to improve an existing application, implementing a continuous evaluation and improvement process is essential to maintaining user satisfaction. As you develop and maintain an enterprise search application, you learn more about the data, and about user behavior. This insight can be used to drive continuous improvement. 

MEASURING RELEVANCY

If search relevancy can be measured, then improvement can be tracked. 

First, a statistically valid set of data is captured to form a “relevancy database”, either by mining user click-logs, or through a manual, computer-assisted process. This database of queries and matching results is then analyzed to find, for example:

  • The average position of the first relevant document
  • The number of relevant documents per query
  • Precision metrics, what percentage of the top n documents are relevant
  • Queries with no relevant documents in the top n results

From these statistics, an overall “search engine score” can be computed. 

A CASE STUDY

Search Technologies implemented the above process to improve the relevancy of search results for Chegg’s website search. The overall result of this initiative was a 3% improvement in conversion rate, representing a $4M increase in revenue during the first year. 

Chegg is a leading online provider of scholastic materials for students, including millions of textbooks, access to online homework help, course organization and scheduling, as well as college and university matching tools. The company is also a pioneer in textbook rentals. When is the last time you read that volume called Advanced Differential Calculus, which you bought, at significant expense during your first year at college? Rental makes a lot of sense for students. 

Chegg recognized the critical sales role that search plays on their website. People cannot buy (or rent), what they cannot find. If a search does not return listings for available and relevant products at or near the top of the results list, the opportunity to sell it is lost. 

There are many specific reasons why search relevancy may fall short of the ideal. In this case, there were two primary classes of reasons:

  • The user accidentally enters a query that excludes or depresses relevant items
  • The indexing process creates terms in the search index that do not correspond to the terms typically deployed by users

Search Technologies' collaboration with Chegg began by gathering and analyzing the query, results, and click logs from both the search engine and the web server. This analysis provided basic insight into existing use patterns over a defined period of time, including:

  • How many queries were executed
  • How many unique queries had been used
  • Distribution across three query options offered on the website What types of queries were deployed, and in what percentages – for example, keyword queries vs. ISBN numbers
  • What results were returned to the user for each query

A further analysis of this information generated three important measurements:

  • The percentage of queries that returned useful results, defined by the user clicking on one or more of the results to get more information
  • The percentage of queries that returned no useful results (meaning no subsequent clicks)
  • The percentage of queries that returned no results at all

Strategies were then investigated to increase the percentage of useful results sets being served, and to decrease the incidence of no results, or no useful results. 

A number of examples are given below, illustrating issues that were uncovered, and the solutions applied: 

Dashes and the "-" (not) operator in queries 
The dash character, "-", is used by many search engines to denote the "not" operator, this includes the search engine in use here. There was no evidence that users deployed the NOT operator in this way. Yet, use of “-“ was quite common. For example:

  • ready-to-wear apparel analysis
  • myers-mcdevitt
  • chen, wai-yee
  • the texas criminal and traffic law manual: 2009 - 2010 edition
  • mcgraw-hill

The "-" character, interpreted as a NOT operator, was inadvertently excluding words from the search clue. This was solved by a simple modification to the query parser, replacing “-“ with a blank space where appropriate. 

Garbage characters in queries 
An analysis of queries received by the search engine showed a significant number of queries containing non-ASCII characters. On closer inspection, it became apparent that this was primarily caused by users cutting and pasting queries, and inadvertently including bullet points and other such symbolic items. 

Some examples:

  • womenâ?Ts lives multicultural
  • â?ohealth science fundamentals
  • exploring career pathwaysâ?? 
  • the isbn# is 13:978-0-13-605992-9
  •  essentials organizational behavior +óGé¼GÇ£ robbins edition

Again, the solution was a query parser modification to remove these characters. An analysis of search log files provided a full list of patterns to be removed, such as "â?o" and "â??." Using this list the offending patterns can be removed, even when they also contain valid ASCII characters. 

Use of "vol", "volume", and volume numbers in queries 
Query log analysis revealed many cases where users were searching for "vol" as short-hand for "volume" and, specifying volume numbers using Roman numerals. 

For example:

  • intermediate acc pkg w vol 1+2
  • intermediate accounting vol 1
  • maus:survivor's tale-vol.i+ii
  • art through the ages: the western perspective, vol. ii, 12th edition, 2006

Normalization of both data and queries, for example, parsing work to map vol --> volume, and ii --> 2 provided an easy solution. 

Custom Relevance Ranking 
Not all of the issues uncovered were associated with the query string. In some cases, changing the way relevance is calculated for a document can significantly improve results quality. 

The various terms in a typical query are not always of equal importance. Also, where multiple editions of a publication were available, the relevance of later editions was boosted. 

Other relevancy adjustments included making title text slightly more important, and text in the author field slightly less important. 

Taken together, these and other adjustments to the relevancy set up provided measurable improvements. 

BOTTOM LINE

A total of ten site search improvements were implemented, with the following outcomes:

  • The incidence of no results decreased by 7.8%
  • The incidence of useful results increased by 3.3%

Overall, the number of searches that resulted in a sale increased, and in the first year of operation of the new system, this translated into an additional $4 million of sales, representing an extremely high ROI for Chegg.

0