Back to top

Textual ETL: A Key Component for Big Data Applications


  • Traditional ETL operates on structured data, originally created by computers
  • Structured data is highly consistent and predicatble in terms of format. Think log files, transaction records, etc.
  • 80% of the world's data is unstructured (textual) or semi-structured in nature
  • More to the point, much of this 80% was created by humans
  • The variability of human-created content stands in huge contrast to the predictable, uniform nature of structured data
  • Humans are insonsistent, emotional, complex, and quite simply unique, in their content creation behaviour

Bottom line: Traditional ETL methods don't work with textual content. 


  • If big data systems are to derive actionable insight from the unstructured world, textual ETL lies on the critical path
  • It requires a different approach, and different technologies

Search Technologies provides consulting, services, and proven software tools, many of which are open-source, as the basis of efficient, textual ETL solutions. 

Contact us for a no-commitments discussion of your textual ETL requirements and ideas.

Unstructured Big Data

Unstructured content is fundamentally different from structured data and must be treated appropriately. This involves specialist skills and technology

Hadoop Consulting Services

At Search Technologies, we've been implementing big data systems for more than five years. We provide Hadoop expertise at competitive daily rates

Enterprise Search and Big Data

Staff Blog, Structuring the Unstructured, describing the crossover from enterprise search technology to the big data world.

IDAL | The Independent Data Access Layer

A free-to-download white paper providing a foundational strategy for big data and unstructured content processing

Why is Content Processing Needed?

The processing of unstructured content prior to indexing requires a different approach. Techniques typically used with structured content can't cope with the variability and unpredictability