Entity Extraction for Solr Lucene
Search Technologies provides entity extraction solutions for Solr, based on the Aspire content processing framework.
Aspire is an indexing pipeline for Solr, based on Apache Felix and the OSGi standard for pluggable Java modules.
Entity Extraction Techniques
Aspire supports both of the basic techniques for entity extraction:
- Regular expression matching, using Groovy scripting, to detect and extract entities such as names, telephone numbers, email addresses, zip codes etc.
- Dictionary-based extraction, using any number of vocabularies to guide the process
In addition, sophisticated third-party components and Web Services can be easily plugged into Aspire
Entity extraction is an important technique for the creation of metadata, to drive user interface functions such as faceted search, and results sorting by property.
A detailed functional description can be found on the Aspire Wiki.
Contact us for an informal discussion of your entity extraction requirements.