Textual ETL: An Important Component in the IT Ecosystem
Simply put, unstructured content becomes much more useful to applications such as Business Intelligence, Enterprise Search, or "big data" analysis, through the addition of structure.
In other words, structuring the unstructured.
Textual ETL: This is one of a number of labels that can be attached to this process. It involves the processing of text. For example:
- To Identify idioms and important entities, and record these as metadata (additional structure)
- To identify of "parts-of-speech". Generally nouns are more important than other word types. Also, combinations of different parts-of-speech can indicate meaning or intent within a sentence
These capabilities are language dependent.
CLEANING AND NORMALIZATION
Most applications require more than than just core text processing capabilities.
- Cleaning: The removal of extraneous or misleading content, such as html menus, or headers and footers, is important to quality. Cleaning should be done prior to text processing
- Normalization: This applies to dates, times, phone numbers and other entities, and also to the format and arrangement of the metadata output from the textual ETL process
As the world's most experienced enterprise search implementation company, we've worked with a wide range of leading technologies and more than 400 customers during the past 8 years. Structuring the unstructured is a vital part of many of the projects we've delivered.
We provide a mix of experience, pragmatism and practical software tools to deliver textual ETL solutions efficiently and sustainably.
Contact us to discuss your textual ETL requirements.