Back to top

Semantic Extraction from Unstructured Text


Semantic extraction refers to a range of processing techniques that identify and extract entities, facts, attributes, concepts and events to populate meta-data fields. The purpose of this is to enable the analysis of unstructured content. 

Bottom line, the semantic analysis of unstructured data is an important technique for "structuring the unstructured," without which, big data applications cannot deliver actionable intelligence.

Further, the accuracy of semantic extraction is critical. Without appropriate accuracy and provenance, you run the risk of feeding decision makers with non-actionable or even misleading insight.


Semantic extraction is usually based on one of two approaches (or a combination of the two):

  • Rule-based matching: similar to entity extraction, this approach requires the support of one or more vocabularies
  • Machine-learning: a statistical analysis of the content, a potentially compute-intensive application that can benefit from using Hadoop, if the data set is substantial. This approach derives relationships from statistical co-occurrence within the document corpus
  • Hybrid solutions: statistically-driven, but enhanced by a vocabulary. This is typically the best approach if the content set is focused on a specific subject area


Aspire, Search Technologies' award-winning content processing platform, supports all of these approaches. Its role is to fully prepare unstructured data, from parsing, cleansing, and normalization, to filtering and semantic analysis. The processed data can then be used in search and analytics projects at any scale, including big data applications.

For further information or an informal discussion of your requirements with one of our experts, contact us.