Back to top

Automated Categorization for Elasticsearch

Great metadata drives search and analysis applications

HEADLINES

  • Search Technologies provides add-on capabilities for Elasticsearch to auto-generate metadata prior to indexing
  • We support both taxonomy-based and statistical categorization techniques
  • Our solutions are delivered under a services engagement, to an agreed specification, ensuring that the customer’s needs are fully addressed

 

DETAILS

Search Technologies is an active Elasticsearch implementation partner.

Document categorization is an important technique for the automated creation of new metadata, which in turn, can be used to drive user interface functionality, such as graphical display of search results, and search navigation options. Two distinct approaches to categorization have developed over the past three decades:

  • Taxonomy-based: A controlled vocabulary is maintained as the basis for categorization, and used with a set of rules, documents are “latched” to one or more categories within the hierarchy
  • Example-based: Categories are defined using example documents, which are processed to produce “document vectors”. Documents from the corpus are then similarly vectorised, and a comparison is made, enabling documents to be associated with one or more categories that have similar vectors

Often, the most effective categorization solutions combine elements of both approaches, each of which has its advantages and disadvantages. The taxonomic approach can often achieve higher levels of accuracy, but depending on the subject matter covered, maintaining a great taxonomy can be expensive. Statistical approaches generally require less maintenance, once set up, but may require some vocabulary support to attain the right level of accuracy.

A great way to start is by using our fixed-price Search Assessment service to scope the project, and ensure that the right approach is taken.

The results of categorization – additional document-level metadata – is passed (via json) to Elasticsearch for indexing.

 

Whether your Elasticsearch categorization needs are simple or sophisticated, CONTACT US to discuss how we can help you implement a solution that precisely meets your needs.