Semantic Extraction from Unstructured Text
Semantic Extraction refers to a range of processing techniques that identify and extract entities, facts, attributes, concepts and events to populate meta-data fields. The purpose of this is to enable the analysis of unstructured content.
Bottom line, the semantic analysis of unstructured data is an important technique for "structuring the unstructured," without which, big data applications cannot deliver actionable intelligence.
Further, the accuracy of semantic extraction is critical. Without appropriate accuracy and provenance, you run the risk of feeding decision makers with non-actionable or even misleading insight.
Semantic extraction is usually based on one of two approaches (or a combination of the two):
- Rule-based matching. Similar to entity extraction, this approach requires the support of one or more vocabularies
- Machine-learning. A statistical analysis of the content, a potentially compute-intensive application that can benefit from using Hadoop, if the data set is substantial. This approach derives relationships from statistical co-occurrence within the document corpus
- Hybrid solutions: Statistically driven, but enhanced by a vocabulary. This is typically the best approach if the content set is focused on a specific subject area
Aspire, Search Technologies' award-winning content processing platform, supports all of these approaches. Its role is to fully prepare unstructured data, in terms of parsing, cleansing, normalization, filtering and semantic analysis, for use in search or analysis projects at any scale, including big data applications.
For further information, or for a no-commitments discussion of your ideas with one of our experts, contact us.