Back to top

Content Enrichment Solutions

Flexible, cost effective and agile content enrichment solutions, built with expert services and fully supported software tools



HEADLINES

  • Search and business insight applications benefit from clean, normalized content which has plentiful and appropriately accurate metadata
  • In search applications, metadata drives numerous important functions such as search navigators and results sorting by property
  • The unstructured content in most organizations is extremely diverse
  • So, rather than looking for a single "silver bullet" technology, a better strategy is to build an open, flexible and agile content enrichment capability using multiple technologies and approaches

It is our contention that this can be done brilliantly, and at a highly competitive total cost of ownership, through a combination of expert services and support, plus software tools based on proven open source projects.


SUPPORTED PLATFORMS

Search Technologies' approach to content enrichment can work with any search engine. For example, with:

  • SharePoint 2013, via BCS or the Web Services extensibility call-out component
  • The Google Search Appliance, based on the Google enterprise connector manager (formerly the Google Connector Framework)
  • Solr Lucene, via a XML or JSON
  • Elasticsearch through JSON
  • Any other search engine or application via XML

    Our customers believe that this approach helps them to build strategic IP through being independent of proprietary technologies, and in full control of the business rules that govern content enrichment. A central content enrichment capability, which can quickly and easily be applied to any data source also promotes organizational agility.

    CAPABILITIES
    Based on the Aspire Content Processing framework and supported by expert services, our content enrichment capabilities include:

    • Taxonomy-based categorization of documents, including the use of complex rules and quality thresholds
    • Sample-based categorization (as used by Autonomy IDOL and FAST ESP) using example content sets to seed categories
    • Entity extraction based on regular expression matching (regex)
    • Entity extraction controlled by vocabularies
    • Rule-based joining and splitting of documents for indexing purposes
    • Rule-based cleaning and filtering of extraneous data (headers, footers, menus, etc.)
    • Enriching document metadata using outside sources


    For further information or an informal conversation, please contact us.