Back to top

HDFS Connector for Search Engines

HDFS (Hadoop Distributed File System) is the main distributed storage used by Hadoop applications. 

Search Technologies is a leading provider of search and big data analytics solutions and services. The HDFS connector is a part of our pre-built and custom connectors that support secure, efficient data connectivity between various content sources and search engines.

Our HDFS connector will crawl content from any given HDFS Cluster using the WebHDFS http interface. All of our connectors are search engine independent and work with a wide range of commercial and open source search engines, including Elasticsearch, Solr, HP Autonomy, SharePoint Search, Google Search Appliance, and others. 


  • Metadata extraction
  • Incremental crawling (so that only new/updated documents are indexed)
  • Runs from any machine with HTTP access to the given HDFS Namenode
  • Filters the crawled documents by paths (including file names) using regex patterns
  • Support for Kerberized Clusters 


If custom features are required, our engineering team will work with you to configure and tailor the HDFS connector to your search needs.

Our team can also help with end-to-end system assessment, implementation, and ongoing support for your HDFS connector.

Contact us for details and pricing.

Request Details & Pricing