click to search this site
 
 
 

Customers

Solr Lucene Relevancy Tuning



OVERVIEW
This is a fixed price services engagement for improving the relevancy of search results within an existing Solr Lucene implementation.
  • Typical duration:  15 days

ABSTRACT
This engagement will install two powerful relevancy ranking improvements into an existing Solr/Lucene installation. Also included is a basic system relevancy evaluation and relevancy tuning exercise based on a small set of sample queries.

Additions to the default relevancy formula in Solr Lucene can dramatically improve search results, solving many of the most thorny relevancy problems including:
  • Reducing the impact of peripheral content (sidebars, ads, tangential discussions, etc.)
  • Automatically handling word phrases in a flexible manner, reducing the need to use complex query constructions to obtain good search results

BUSINESS BENEFITS

This fixed price service can ensure that open source-based search applications provide highly relevant results to users. Improvements in relevancy can transform the contribution that a search application makes to a business process. Relevancy improvements often dramatically increase search system usage and user productivity.

FEATURE DESCRIPTION
Search Technologies has developed two key improvements to the Solr Lucene relevancy ranking algorithms:

Parameterized Document Similarity Function
Default Solr Lucene systems are based on a fixed document similarity function that depends heavily on term-frequency / inverse database frequency (tf-idf) statistics. These default implementations put too much weight on document sizes (boosting small documents) and rare terms in relevancy calculations. Search Technologies provides parameterized versions of tf-idf giving substantially more control over the relevancy formulas. This new operator has configurable parameters to determine the exact amount of boost for tf-idf ranking factors and also provides upper and lower thresholds that reduce the effects of unreliable statistics at very low-granularities (when terms only occur in a few documents).

Note:  Versions 1.4.1 and 1.4.0 of Solr will require a source code patch to implement the Parameterized Document Similarity Function. Releases currently in development (expected to be numbered as version 3.1 or later) can be implemented via a configuration change and a drop-in library.


Gradient Proximity Boost
Default Solr Lucene systems have a very limited “hard window” proximity boost. If all terms are “within window” the document will receive a fixed boost multiplier. If any term is “out of window” no boost is applied.

The Search Technologies Gradient Proximity Boost operator instead measures the density and completeness of terms across the document. Documents in which terms are clustered close together will be boosted more than documents in which terms are widely distributed, but in a gradual way. This operator eliminates the need to tweak fixed window sizes.

PRE-REQUISITES
A working Solr / Lucene system with documents already indexed.

EXPECTED ENGAGEMENT TASKS
  • Current system evaluation
  • Gather basic statistics on the document base (number of documents, average size, number of fields, tokens per document, tokens per field, etc.)
  • Gather basic statistics on the query set (number of tokens per query, types of operators used, etc.)
  • Gather sample queries for relevancy tuning - typically a set of 20-30 queries gathered from query logs or via interviews with subject matter experts
  • Operator installation, configuration, system integration, testing and deployment
  • System tuning based on the sample queries
  • Demonstration / report on the relevancy improvements achieved

DELIVERABLES
  • A working Solr Lucene system with new operators included
  • New operator source code (if desired)
  • Documentation on operator settings
  • A relevancy evaluation report

ONGOING SUPPORT
Search Technologies is able to provide software maintenance and support services, including 24 / 7 options, both for the newly installed operators or for Solr Lucene as a whole.


News

  • Search Technologies and Google Inc. announce Federal Search Seminar
  • Chick-fil-A contracts with Search Technologies for Managed Search Services
  • Search Technologies helps Apex to deliver new search capabilities to a major Puerto Rican news provider
  • Search Technologies delivers integrated SharePoint 2010 / Fast Search system to Sandia National Labs
  • The United Methodist Publishing House signs with Search Technologies for NXT implementation services
  • Search Technologies announces 200th customer
  • The US Government Printing Office renews search services agreement with Search Technologies

PaulLovesSearch

YASE: (Yet Another Search Engine) "Antidot" - http://t.co/ksEriHAQ .
Part 2 of my graduate course on relevancy ranking blog is now available. http://t.co/EKdTOKBs . Enjoy!
Follow PaulLovesSearch