Search is the New NoSQL?
When it comes to accessing data, nothing beats a search index for flexibility.
- Most data analysis has traditionally been delivered through a series of fixed reports.
- The interqactive analysis of data - a human having a conversation with a large data set - will in the future compliment set reports, and provide a platform for data-driven innovation
- Search will play a key role in delivering highly agile, interactive analysis applications.
BIG DATA ANALYSIS NEEDS FLEXIBILITY
For 30 years, SQL databases have been the defacto standard for storing information in a secure and controlled manner. A number of organizations, led by Oracle, have grown to become power-players in IT through providing robust SQL database technology. SQL databases continue to store the vast majority of the world’s structured data.
More recently, NoSQL databases have been adopted for some applications. These offer a number of potential advantages, including scalabilit, and better fine-grain control over data availability. NoSQL databases are popular for analytical (“Big Data”) and real-time applications. The name can be interpreted as “Not Only SQL” indicating that they can also provide similar functionality to SQL, for query purposes.
In short, NoSQL databases store data in XML or JSON formats, rather than in tabular rows and columns. This “schemaless” approach adds a lot of flexibility, which useful when it comes to analytical applications. At the time when data is committed to the store, it is impossible to know how or why an individual or business application might wish to analyse or cross-reference the data, at some point in the future.
IS SEARCH THE NEW NoSQL?
But if you really want flexibility, consider using a search index as the access layer.
OK. Maybe not for the master data store. Depending on the type of data you are dealing with, the master copy could be anywhere, from a data warehouse, to Documentum, or DropBox. But in terms of how data is served to analytical applications, a search index makes a lot of sense. Gartner wrote about this recently, see Enterprise Search can Bring Big Data within Reach.
Cloudera has also recognized the utility of a search index for analytical applications, and introduced Cloudera Search, based on Apache SolrCloud (and hired a top-gun Solr guru to back that up).
Elasticsearch approaches analytical applications with a pure-play search index, complimented by Logstash, a logfile ingestion tool, and Kibana, a UI technology for analytics applications. Together, these comprise the open source "ELK" stack. Interest in, and uptake of tElasicsearch has been impressive, see the Google Trends graph below.
So why is a search index a great option for analytical applications? Here are a few reasons:
- A search index is the next step in schemaless data structures, giving even more agility, flexibility, and performance.
- Search indexes are a mature technology. They have been around since the 1980s, and during the past 15 years many large companies have invested huge R&D dollars into improving search.
- Search indexes are extremely scalable. If you don’t believe this, think about what lies behind that simple search box next time you use Google.com, and receive highly processed search results in a few milliseconds. After all, Google originated the MapReduce concept, and distributed file system structures (now available in the open source, through Hadoop).
SEARCH FLEXIBILITY IN ACTION
Here's a simple example. In a fraud detection application, data and content has been pulled together from a range of sources, and made available for analysis. Investigators use an "Analysts Workbench" to look at the data in different ways, as they try to keep up with the fraudsters, who never stand still. Once a successful method of committing fraud has been discovered, the smart fraudsters will move on, and come up with new approaches. In an evolving environment such as this, any limitations on the ways in which data can be examined, viewed, or cross referenced, will potentially hinder the investigator. For sure, having the data simply lined up in fixed rows and columns won't cut it.
Search indexes provide the ultimate in flexibility to meet scenarios such as this.
So if you are thinking of creating a new analysis application, you may find that search is a more appropriate access platform than NoSQL to exploit your analytics ideas.
Disclosure note: Search Technologies is an official implementation partner of both Cloudera and Elasticsearch.