Solr vs. Elasticsearch - Choosing Your Open Source Search Engine
Why are we here? What is the purpose of my existence? Should I exercise or rest and save my energy? Wake up early for work or start late and work through the night? Should I eat my french fries with ketchup or mayonnaise?
These are all age-old questions that may or may not have answers. Some of them are very hard or terribly subjective. But let me put a bit of effort into trying to answer one of them: Should I use Solr or Elasticsearch?
Here is the scenario. Your organization is looking to implement your first search engine, switch to another search engine - calling out to all the Google Search Appliance (GSA) users looking for a replacement! - or try to save money by moving to open source. You, as a proficient and capable developer, have been called to solve a difficult problem. Your problem has many business requirements, but at the core, it is a “big data and search” problem.
You need to extract a lot of content from multiple data sources and get insights from that data to help your company grow and achieve their objectives for this year.
There is a lot at stake here. You can’t miss and you have only one shot. You need the right search engine for the job, you are thinking open source, and you have two popular choices: Solr or Elasticsearch, both of which are steadily ranked in the top two spots among open source and commercial search engines, according to DB-Engines.
Which Open Source Search Engine Would You Pick?
This is not a coin toss or an easy pick. Both search engines are great and there is no one “right” choice. It all depends on your requirements.
So the first step is to understand what application you have to build. Then, the next step is to see what each search engine has to offer. And by the way, if you’re still at the intersection of open source vs. commercial solutions, get our free e-book for a deep-dive into the 10 key criteria to consider when selecting a search engine.
A couple of years back, we wrote a high-level overview blog on Solr vs. Elasticsearch, which discussed overall trends and non-technical insights. Now, as both Solr and Elasticsearch have evolved and become dominant players in the open source search engine market, let’s take another fresh look at each and see where it takes us.
Age & maturity
In this case, we can say that Solr has a longer history as it was created in 2004 by Yonik Seely at CNET Networks, which then contributed it to Apache in 2006. It finally graduated to a top-level project in 2007. On the other hand, we have Elasticsearch, which was officially created in 2010, although it was really started in 2001 by its founder Shay Bannon under the name of Compass. Since then, the creators of Kibana, Logstash, and Beats have joined Elasticsearch to create the Elastic Stack product family, which has emerged as a powerful player in the search and log analytics space. With that said, Solr has an advantage of being visible in the market at an earlier date.
Community & open source
Both have very active communities. If you check Github, you can see that they are very popular open source projects with plenty of releases.
A very important detail is that while both are released under the Apache license, and both are open source, they work a little differently. Solr truly is open source - anyone can help and contribute. With Elasticsearch, while people can still offer their contributions, only Elastic’s employees (the company behind Elasticsearch and the Elastic Stack) can accept those contributions.
Is this good or bad? It depends on how you look at it. This means that if there is a feature you need and you contribute it to the community, with adequate quality, it can be accepted into Solr. With Elasticsearch, it’s up to Elastic to decide whether a contribution would be accepted. So there may be more feature options on Solr. On the other hand, contributions to Elasticsearch, which go through more levels of quality checks, may offer higher consistency and quality.
Both Solr and Elasticsearch have very well-documented reference guides. Elasticsearch runs on top of Github and Solr uses Atlassian Confluence. You can find them via the links below.
Let’s get a little bit more technical. Elasticsearch and Solr are two different search engines. But underneath, they both use Lucene, which means both are built on "the shoulders of giants."
For those of you who wonder why I consider Lucene a "giant," it is the actual information retrieval software library under the hood of many search engines. It is extremely fast, stable, and probably can't get better than this. Lucene was created in 1999 by Doug Cutting - one of the creators of Hadoop. So there you go, Lucene is the perfect choice for using at the heart of a search engine.
Java APIs and REST
Elasticsearch has a more “Web 2.0” REST API, but Solr does have a much better Java API with SolrJ - or SolrNet if you use Microsoft technologies. Elasticsearch has Nest and Elasticsearch.Net. Solr’s REST API may feel less flexible, but it works wonderfully for what you need: indexing and querying. Elasticsearch speaks JSON, so if you use JSON all around, then it is a good choice. Solr supports JSON as well, but it was added at a later stage as originally it was aimed for XML.
Because they both expose an API, it is simple to index content from your custom application or already existing and configurable applications. For example, our Aspire content processing framework is able to connect to multiple data sources and post to either Solr or Elasticsearch.
Solr also has a feature for extracting text from binary files using Apache Tika. So you can upload a PDF via the ExtractRequestHandler and Solr will know what to do with it.
On the other hand, Elasticsearch works nicely with Logstash, which can process data from any source and index it.
Scaling is a key consideration. In this scenario, Elasticsearch was winning the game when Solr was still constrained to Master-Slave. However, SolrCloud has recently come into the game. And with the help of Zookeeper, it is now possible to scale a Solr cluster in a much easier and faster way - an enhancement compared to older versions of Solr with Master-Slave. It will still need a lot of improvements, but the future looks bright in terms of the size of datasets that can be ingested and searched in Solr.
There are several companies that got to a point where they had to decide which product worked best for them. For example, Cloudera selected Solr as their search engine to integrate into the open source CDH (Cloudera Distribution Including Hadoop). On the other hand, there are other vendors who have selected Elasticsearch as the search engine for their solutions. We at Search Technologies help with the consulting, deployment, and support of both search engines.
Vision & ecosystem
Solr has been more oriented towards text search. Elasticsearch quickly carved out its niche, aiming for log analytics by creating the Elastic Stack (formerly known as the ELK Stack), which stands for Elasticsearch, Logstash, Kibana, and Beats. Both have a clear vision and they are making great strides in their directions.
One thing worth reiterating is how both search engines are being used as the foundation of many leading search and big data platforms. For example, Elasticsearch is part of Microsoft’s Azure Search while Solr has been integrated into Cloudera Search.
When it comes to performance, based on the experiences I have heard from many developers, we can say that both engines are solid performers. Thus, for the majority of use cases, whether it is an internal or external search application, performance won’t be much of an issue if the developer designs and configures them properly.
Solr comes with web administration bundled in, while Elasticsearch has multiple other premium plugins for security, alerting, and monitoring. This list showcases Elastic's entire product family.
There are many ways to visualize the data in Solr and Elasticsearch - you can build your custom visualization dashboard or use the search engine's standard visualization features, perhaps with some tweaks. But there is one difference worth mentioning.
Solr has focused primarily on text search. It does a great job at this, becoming what seems to be the standard for search applications. But Elasticsearch has moved in a different direction where it goes beyond search to tackle log analytics and visualization with the Elastic Stack. Below are some visualizations you can do with Kibana 5.
This does not mean one is better than another. It just indicates that each search engine has its own strengths in different use cases and needs, and your selection will greatly depend on what your organization wants to accomplish.
So long story short, both Elasticsearch and Solr are excellent open source choices that will help you get more out of your data. It all depends on your requirements, your budget, your timing, and the complexity of your project. This e-book is a relevant resource to guide you through your decision-making process.