Search Technologies, Corp.
Search
click to search this site
 

Customers

 

Extraction, Transformation and Loading (ETL) for Search


Search Technologies maintains a services practice dedicated to the parsing, transformation, enrichment and cleansing of both structured and unstructured information. This services practice originated from our core business activity of implementing sophisticated search solutions. Where there is a requirement to provide search over a range of disparate data sets – a common scenario for both private intranet and public-facing applications – a key aspect of achieving user satisfaction with search relevancy is getting the data sets to “play well together”.

The service practice is based on a combination of:
  • Staff expertise in data transformation, parsing and enrichment
  • Proven, well practiced methodologies and project management techniques specific to data transformation
  • A range of software tools and utilities (built on open source components) which aid implementation efficiency. This includes a rapid development framework based on the OSGi framework www.osgi.org which is licensed for perpetual use free of charge to Search Technologies customers. This framework provides a productive and open environment in which customized data transformation systems can be developed, upgraded and maintained.

Quality control and exception handling are an important and integral part of this service practice, enabling errors and exceptions to be identified, quarantined and dealt with.

All internal data processing within this service practice and its supporting utilities and frameworks is handled in xml. Input can be from any repository and in principle of any format (a range of common repositories and popular document formats are supported as standard). The original purpose of this service practice was data preparation and enrichment prior to indexing into a search engine. The service practice and its supporting tools are entirely search engine agnostic and are in use with a range of search products including Solr Lucene (open source), Microsoft FAST ESP, Microsoft SharePoint and Google Search Appliances.

The service practice is also applicable where no search engine is directly involved and the content being processed is destined for a content management system, database or other application.

For more information contact us for a no obligations discussion