Back to top

Indexing Amazon S3 Buckets Using CloudSearch

Stream S3 Bucket content directly into Amazon CloudSearch


  • Search Technologies provides a cost-effective Java toolkit enabling S3 Bucket content to be quickly and easily streamed to the new Amazon CloudSearch service
  • A wide range of formats is supported including Microsoft Office documents. PDF, HTML, ODF and RTF
  • ACLs are captured where appropriate to maintain document-level security
  • Metadata manipulation and enhancement capabilities to drive features such as faceted search are provided

Search Technologies also provides expert implementation services for Amazon CloudSearch

The Aspire content processing framework is a 100% Java toolkit based on OSGi. It comes with ready-made connectors to acquire content from S3 buckets, and stream it through to the CloudSearch indexer. During this process, values can be added to the data structure to enhance the search experience. For example:

  • Extracting metadata from the original source content and mapping it to CloudSearch via an XSLT transform
  • Creating new metadata through categorization or enrichment
  • Creating document summaries for use in search results

Aspire is available with commercial-grade support, including 24/7 options.


Contact us for further details.