Oracle Endeca E-Commerce Search: Tuning and Maintenance Best Practices
Auto engines require regular maintenance. We normally don’t wait until the car’s engine light is on before taking it to the shop for a tune-up or oil change; we would do that about every few thousand miles. Proper maintenance does not only maximize performance but also extends the engine’s life.
Similarly, while e-commerce site search is expected to perform at its best during peak holidays or shopping seasons, like an auto engine, site search also requires “regular maintenance” to ensure accuracy and efficiency during non-peak times. Especially when products on your e-commerce site are continuously added and updated, a thorough site search maintenance and improvement strategy is essential for boosting user engagement, conversion rates, and ultimately, your bottom line.
In this blog, I will share some practical techniques on how to maintain and tune Oracle Endeca, a common e-commerce site search platform. I’ll focus on three key areas that impact search performance (regardless of the platform being used): accuracy, efficiency, and relevancy.
Start with asking “Does Endeca return accurate results for a given query?” We would think that as long as Endeca indexes the records containing the matching keywords based on the user’s search terms, it would return those records. But that’s not always the case.
For example, if 10 fields were indexed but only 6 of them were searchable (by being included in Endeca’s search interface), queries that don’t match the searchable keywords would not return any results. But including all fields in Endeca’s search interface is generally not a good practice because it unnecessarily creates a lot of search noise.
Considering the long list of daily search terms on any given e-commerce website, we can’t afford to manually investigate zero-result search terms one by one. To address this problem, we developed the following systematic approach:
Step 1. Generate a list of daily searched keywords for which Endeca reported zero results. The list can be extracted from Endeca engine request log, which contains the following desired attributes:
Step 2. Make all text fields searchable without making all of them part of the existing Endeca’s search interface. Only the fields/dimensions in the search interface will be searched; those that are not part of the search interface will never participate in search even though they are indexed.
Step 3. Use the search terms that produced zero results we’ve identified to verify against all searchable text fields.
Step 4. Generate a report that shows the result count discrepancy between the text field and the search interface - if the search interface returns no results but individual text fields return results greater than 0, we've successfully identified the cases in which Endeca had mistakenly provided users with zero results.
The following diagram illustrates this process:
The following table records the output of the above step. For all search terms (column 1) that did NOT truly produce zero results (column 5), further analysis and action are needed. Those search terms were in fact not supposed to have produced zero results.
Step 5. Based on the above findings, we can use one of the following approaches to solve the problem:
- Add the text fields that returned results to the existing search interface, or
- Copy the value of the text field that returned results to one of the existing fields in the search interface.
In addition, partial match configurations can also contribute to zero-hit scenarios. Consider cases in which users search for “powerful dishwasher” and “quiet refrigerator” on a home appliance website. The default Endeca partial match configuration dictates that results have to match at least 2 words (see screenshot below), which effectively turns all search terms with two keywords into “match all keywords.” As a result, if the retailer’s website doesn’t include “powerful” or “quiet” in its product descriptions or titles, no refrigerator or dishwasher would show up on the user’s search results page. Retailers can consider tuning partial match to “match at least 1 word” to reduce zero-hit rates.
Furthermore, Endeca by default provides only noun stemming dictionary that includes nouns in singular and plural forms. This is very useful for product searches that contain only nouns. For example, Endeca treats “red shirt” and “red shirts” equally, returning identical sets of results for these search terms. However, without a proper verb stemming dictionary, Endeca can behave differently with respect to different verb forms – the difference between “broke my phone screen” and “my broken phone screen” can be as dramatic as 100 results vs. no results. Thus, a verb stemming dictionary is strongly recommended, especially for websites that provide consumer product support.
Endeca uses engine cache to store results that were already processed in previous requests, which helps improve search performance because it avoids processing the same requests repeatedly. While it is advantageous to leverage engine cache to boost performance, there are several things to consider:
- Identify the queries of which results can be cached from engine request log. These queries will be used to warm up the engine. For example, Endeca-powered top navigation menu items are generally common across all pages. This is a good candidate for cached results rather than hitting the engine for every request. Another good candidate for caching is popular search queries. For example, for an electronics retailer or a department store, some popular holiday search queries could include “Xbox,” “Amazon Echo,” or “black friday deals.”
- The engine cache memory will have to be big enough to hold cached results.
- The engine cache is validated after every baseline update (full refresh of the index), at which point the cache needs to be populated (warmed) using the queries identified above.
Endeca search relevancy is tightly influenced by two major components:
a. Endeca's search interface – consists of a list of searchable fields from each record in the index. The more searchable fields included in a search interface, the wider the search. The opposite results in a narrow search.
b. Relevance ranking modules – out-of-the-box ranking algorithms that, when placed one after another, produce the desired ranking orders. The most frequently used modules are:
- Number of terms – ranks results based on the number of matched terms matched.
Search term: “leaking kitchen sink”
Matching records: “my kitchen sink does not leak anymore after I fixed it” and “I have not yet installed a sink in my kitchen yet”
Ranking: Record 1 is ranked higher than record 2 because it matched all three keywords.
- Single match vs. cross field match – single field match of all search terms has a higher score than those matched cross field.
Search term: “popular spring break destinations”
Title: “Popular destinations for spring break!”
Description: “Discounted airfare, hotel for spring break…”
Title: “What’s popular for spring break?”
Description: “These are everyone’s dream destinations!”
Ranking: Record 1 is ranked higher than record 2 because its title matched all keywords in the search term.
- Static field sort – sort by field values in ascending/descending order. Popularity is a good example to which you can apply this algorithm.
The following diagram demonstrates how the relevancy components work together to produce the desired ranking order.
As online users have become so accustomed to Google search, your shoppers will expect similar e-commerce site search quality. Thus, in addition to your SEO effort which happens via Internet search engines, accurate and relevant site search is crucial to the user experience once you’ve drawn them to your website. Proper assessment, modifications, and maintenance of your search engine, whether it is Endeca or another e-commerce search platform, will ultimately contribute to your conversion rates and bottom line.
Watch our on-demand webinar to see how you can use a scorecard to evaluate and improve your site search performance. And contact us to find out how our search assessment and quality analysis can help increase your e-commerce revenue.