Back to top

Textual ETL: An Important Component in the IT Ecosystem

Understanding and maximizing the use of unstructured content has become key to successful business intelligence, enterprise search, and various analytics applications. In other words, enterprises are looking for ways to structure the unstructured.


This is one of a number of labels that can be attached to this process. It involves the processing of text. For example:

  • To Identify idioms and important entities, and record these as metadata (additional structure)
  • To identify "parts-of-speech." Generally, nouns are more important than other word types. Also, combinations of different parts-of-speech can indicate meaning or intent within a sentence

These capabilities are language-dependent.


Most applications require more than than just core text processing capabilities

  • Cleaning: The removal of extraneous or misleading content, such as HTML menus, or headers and footers, is important to quality. Cleaning should be done prior to text processing.
  • Normalization: This applies to dates, times, phone numbers and other entities, and also to the format and arrangement of the metadata output from the textual ETL process.


With deep expertise in enterprise search implementations, we've worked with a wide range of leading search technologies and hundreds of enterprise customers over the past decade. Structuring the unstructured is a vital part of many of the projects we've delivered.

We provide a mix of experience, pragmatism, and practical technology assets to deliver textual ETL solutions efficiently and sustainably.

Contact us to discuss your textual ETL requirements and how we can help you maximize the value of unstructured data.