Back to top

AWS Cloudsearch: New Features Summary

On 25th March 2014, AWS Cloudsearch launched a significant upgrade, in terms of both performance and features.  The new version is based on Solr, the leading open source search platform.  

The summary below provides a brief run-down of some of the new features we’ve found in Cloudsearch, in no particular order, and comparing it to the original version which was launched April 2012. 

Data Types 
Cloudsearch now provides for a wider range of data types, enabling it to be applied to a broader variety of search applications. Data types now supported, include; date, date-array, double, double-array, int (integer), int-array, latlon (short for latitude-longitude), text, text-array, literal, and literal-array. 

Hit-Centric Summaries 
This is a key feature for when the source data lacks reliable structure. Cloudsearch now provides this, with search keyword highlighting. 

New / Enhanced Search Features 
Here are our favorites: 

Proximity searching: A NEAR operator has been added 

Term boosting: This is useful for fine-grain relevancy tuning. 

Better range searching: This can be used with all field types (numbers, dates…) 

Native Geo support: For example, to order search results by distance from a specific location. This is an important e-Commerce function. 

Multiple / Optional Query Parsers: Cloudsearch now supports simple, structured, lucene, and dismax parsers, providing flexibility, in terms of how queries are processed, and how relevancy is calculated. These relatively low-level functions don’t get business folks excited. But they are important tools with which technical folks create and tune great search systems. 

Size / Scale Options: Cloudsearch now provides better control over how your search application will scale, as load is applied. 

Language Support
Cloudsearch now provides support for 32 languages, as listed below. Support includes language-specific text analysis, and both algorithmic and dictionary support for stemming. The language can be defined at a field level, which can be useful in multi-lingual environments. 

Arabic (ar), Armenian (hy), Basque (eu), Bulgarian (bg), Catalan (ca), Chinese simplified (zh-Simp), Czech (cs), Danish (da), Dutch (nl), English (en), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Irish (ga), Italian (it), Japanese (ja), Korean (ko), Latvian (la), Norwegian (no), Persian (fa), Portuguese (pt), Romanian (ro), Russian (ru), Spanish (es), Swedish (sv), Thai (th), Turkish (tr)