Back to top

An Open Source Approach to Log Analytics with Big Data

In the Trenches with Big Data & Search – A Blog and Video Series

Paul Nelson
Paul Nelson
Innovation Lead

Businesses had used logs for insights long before big data became the next cool thing. But with the exponential growth of log files, log management and analysis have become so daunting to almost impossible. How did we leverage open source big data to process 600+ GB daily for faster, more accurate, and more cost-effective log analytics? 

Watch the full story below.



The Perils of Data Lakes

Want to run an agile IT network? You need system logs to identify potential security issues and network failures. Work in a highly-regulated environment like finance, legal services, or government? Log data may be required for regular audits and compliance reports. In the e-commerce business? User logs open a wealth of insights for better user experience and conversion. 

There are two common log types:

  • Event logs – provide a comprehensive view into how your systems and their components are performing at any point in time: if your servers are running fine, or if there are any network failures and abnormalities in your network. 
  • User logs – bring an intimate understanding of your online user behaviors, such as what they did on your website, things they clicked on, etc during the buyer's journey. Analyzing raw user logs allows more control, accuracy, and transparency into user activities beyond statistics provided by standard web analytics services like Google Analytics or Omniture. 

With data amounting to terabytes, even petabytes, it’s virtually impossible for traditional log analysis software to quickly and accurately discern patterns and pinpoint trends. Without an efficient and automated process to make sense of this data, organizations would face the danger of dumping valuable data in an unrefined “data lake,” and eventually lose the ability to discover data-driven competitive advantages.

How about a search and big data analytics approach for making the best use of log data?


Navigate and Analyze Logs with Big Data and Search 

Big data architecture for log analytics

Many robust big data applications for log analytics have helped organizations avoid the perils of the “data lake.” These applications are aided by big data’s processing power, machine learning, predictive analytics, and advanced search capabilities. 

A big data enabled log analytics platform: 

  • Gathers and stores raw log files from multiple business systems (often hundreds of GB daily)
  • Runs the data through buffers
  • Loads it into a log analytics stack for query parsing, search indexing, and trend visualization
  • Allows businesses to perform large-scale analysis of user trends, clusterings, clustering trends, market trends, etc. 


Open Source Log Analytics: Big Data within Every Business' Reach 

While there is a wide range of log management and analysis tools, as log data grows exponentially, open source log analytics stacks can provide full enterprise-class features and reliability in a more affordable way. Elastic’s ElasticSearch, LogStash, and Kibana (ELK) stack is a rising example, with three components working together to provide a seamless log analytics process:

  • Elasticsearch: import of log files into a search engine for indexing and access through search
  • Logstash: log gathering, storage, and parsing
  • Kibana: intuitive browser interface for trend visualization and analysis

In addition to out-of-the-box log analytics solutions, you can bring together open source and commercial solutions to build your own custom log analytics stack. Some options we've seen include: 

  • Log data aggregating and processing: Apache Flume, Search Technologies' Aspire
  • Search engine indexing: Solr, Lucidworks
  • Trend visualization and analysis: Apache Hue, Pentaho Analytics and Data Integration, HighCharts, D3 Charts


In Action: Better E-Commerce Conversion with Real-Time Personalization through Logs

Apache Spark logistic regressionIf you are an online business then understanding what your website users clicked, what they searched for, what their entire shopping process looked like, is undoubtedly key to a greater bottom line. Now think about capturing and analyzing that data in real-time! You already know it – the right data at the right moment produces optimized results.

We have been working on a sophisticated big data architecture using Apache Spark to provide real-time user profiling with clicks, products added to cart, search queries, etc. With data processing speed near real-time, Apache Spark can input user log data into a real-time personalization engine that customizes search results, catalogues, recommendations, etc., paving your way to greater user experience, and eventually, better conversion.

The technologies have started to fit together - massive log data, Hadoop for big data analysis and reporting, an intelligent search engine, and Apache Spark promise a cost-effective way for real-time intelligence. Log analytics with search and big data lets modern business go beyond the “data lake” to continue the race to seize the best opportunities, starting with an intimate understanding of the ever-changing user behavior.


Log analytics is a use case in our “In the Trenches with Search and Big Data” video-blog series – a deep dive into six prevalent applications of big data for modern business. Check out our complete list of six successful big data use cases and stay tuned for more video stories of organizations that found success from these use cases.

Sign up for our newsletter to get the latest updates on "In the Trenches with Search and Big Data" video-blog series.


Search & Big Data Analytics Newsletter