Back to top

Semantic Extraction from Unstructured Text

SEMANTIC EXTRACTION - GAINING INSIGHTS FROM UNSTRUCTURED DATA

Semantic extraction refers to a range of processing techniques that identify and extract entities (for example, people, locations, companies, etc.), facts, attributes, concepts, and events to populate meta-data fields. The purpose of this is to enable the analysis of enterprise unstructured content, such as text documents, emails, images, reports, and other business-critical content. 

Bottom line, the semantic analysis of unstructured data is an important technique for "structuring the unstructured," without which, search and analytics applications cannot deliver actionable intelligence.

Further, the accuracy of semantic extraction is critical. Without appropriate accuracy and provenance, you run the risk of feeding decision makers with non-actionable or even misleading insight.

OUR SEMANTIC EXTRACTION APPROACHES

Semantic extraction is usually based on one of two approaches (or a combination of the two):

  • Rule-based matching: similar to entity extraction, this approach requires the support of one or more vocabularies
  • Machine-learning: a statistical analysis of the content, a potentially compute-intensive application that can benefit from using Hadoop, if the data set is substantial. This approach derives relationships from statistical co-occurrence within the document corpus
  • Hybrid solutions: statistically-driven, but enhanced by a vocabulary. This is typically the best approach if the content set is focused on a specific subject area

TECHNOLOGY ASSETS TO SUPPORT UNSTRUCTURED DATA PROJECTS

  • Aspire, Search Technologies' award-winning content processing platform, supports all of these approaches. Its role is to fully prepare unstructured data, from parsing, cleansing, and normalization, to filtering and semantic analysis. The processed data can then be used in search and analytics projects at any scale, including big data applications.
  • Saga Natural Language Understanding (NLU): enables non-data scientists to create and maintain powerful, flexible, tested, and scalable enterprise language models for user interaction and document understanding. It incorporates many language modeling techniques and machine learning into a single, user-friendly semantic framework to handle a wide variety of natural language use cases.


For further information or an informal discussion of your requirements with one of our experts, contact us.

0