AWS CloudSearch Upgrade: A First Look
The decision to base CloudSearch on Solr, could be game-changing
As I write, a technical colleague a few yards to my right, is creating a new document parser, on behalf of a customer, to address a thorny data quality issue. Whichever search engine the customer is using, the notion of rubbish-in, rubbish-out applies, so poor data quality usually means search relevancy problems. My colleague knows how to fix these.
He is dealing with a substantial data set, and he’s tuning things up through a series of “change - re-index - test” cycles. Anyone who has implemented a business-critical search system will be familiar with this iterative process.
PRODUCTIVITY & AGILITY
With the customer’s permission, the data was sent up to AWS overnight, and this morning, my colleague fired up an extra-large EC2 instance, installed some software, and while he’s been thinking and coding at his desk, a big server, somewhere on the planet, has been processing text for him. I’m told that the work will be finished in a few days, at which point the EC2 instance will be let go.
Becoming familiar with Amazon Web Services has improved my colleague’s personal productivity and the agility with which we are able to serve our customers. Using AWS in this way is neither unique to this colleague, nor to our company. But it is painful to think back, about how we used to do this.
EC2 HOSTED SOLR
In the world of enterprise search, we are seeing an increasing number of customers choose to run production systems on AWS. The combination of Solr (the leading open source search engine) running on EC2 instances, is popular, especially for Web facing applications such as site search, or e-Commerce search. We recently helped a leading business publisher to move a large and complex search infrastructure onto Solr/EC2 - see the case study here.
Solr is not for everyone. Being open source, there is of course no up-front license fee to pay. Customers like that idea. It is also mature, robust, scales well, and my technical colleagues like those aspects.
At Search Technologies, we value our independence, and generally avoid thinking in terms of one search engine being better than another. Instead, we first learn the customer requirements in detail, for example through our Assessment Service, and based on what we find, an appropriate search engine can be chosen. In most situations, a variety of search engines, often including Solr, have the necessary capabilities, and so the customer has a choice.
Key reasons for not choosing Solr, include customer doubts about the maintenance and support of open source software.
SO, WHAT DO WE THINK OF THE NEW CLOUDSEARCH?
Its functional capabilities are much improved, and it seems to be twice as fast as the original.
A full feature description can be found on the AWS Blog.
However, what is most intriguing, is that CloudSearch is now built on Solr.
The guys at AWS are not making a fuss about this for the moment, probably because it isn’t exactly Solr. For example, a few Solr functions that don’t "elasticate" well, are missing. The API is a little different too, so that it fits alongside other AWS services, and there will be a few Solr purists who are not entirely comfortable with that.
What are the implications?
Here are some that come immediately to mind.
1. THIS IS A HUGE VOTE FOR SOLR
Solr is a substantial, and active Apache project. One of the IT jungle’s largest gorillas just threw its weight behind Solr. Purists should be celebrating. This can only strengthen Solr’s long-term prognosis. That in turn, should re-assure Solr users and implementers alike.
2. IT CAN ADDRESS SUPPORT & MAINTENANCE DOUBTS
Like raw Solr, Cloudsearch involves no up-front license fee. Customers like that. As with all AWS services, the model is pay-as-you-use. Organizations that have been put off Solr because of doubts about the support and maintenance of open source, may now be able to relax a little. CloudSearch is a fully maintained service, and AWS already enjoys a good reputation for support within many organizations, including ours. So, here is a way of deploying Solr which even the most cautious IT manager has no reason to object to.
3. CLOUDSEARCH NOW MEETS A WIDE RANGE OF REAL-WORLD SEARCH REQUIREMENTS
New functionality added to CloudSearch fills some important gaps, and this will mean that it fully meets the needs of a much wider range of search applications. Although we expect the majority of CloudSearch applications to continue to be Web-facing, there is nothing to prevent it from being used for intranet / enterprise search applications as well. For example, we do this using add-on technologies such as data connectors, content processing, and query processing. Add a nice, flexible UI such as Twigkit, and you have a functional, highly cost-effective enterprise search platform.
4. CLOUDSEARCH DRIVES DOWN THE COST OF OWNERSHIP
Pricing for CloudSearch remains notionally the same. For smaller applications, you can get started for less than $100 a month; and remember, that includes the hardware, software, maintenance for both, and support. Initial tests (we've had access to the Beta program) indicate that an instance of the new version is approximately twice as efficient. This means, for example, that to meet a particular query load, you’ll need as little as half the number of instances, compared to the original CloudSearch service.
The original CloudSearch was already considered to be a very cost-effective option.
WHAT DOES IT ALL MEAN?
Viewed separately, each of the above points is noteworthy.
Taken together, they may make a significant impact on the enterprise search landscape.
Read on…. New Functionality in CloudSearch