Back to top

A Hadoop Tutorial

A free guide to getting started with your first MapReduce job.

INTRODUCTION

Big data is everywhere. Recent analyst estimates suggest that by 2020 there will be 44 zettabytes out there - which in more familiar terms is 44 billion terabytes.  This flood of data is coming from many sources, and it is time for anyone interested in the storage and analysis of data to gear up for big data.

 

DISTRIBUTED PROCESSING WITH HADOOP MAP-REDUCE

Hadoop MapReduce is a software framework for writing applications to process large datasets in parallel, on clusters of servers. Working with MapReduce and very large datasets will soon become a core skill for most programmers.

A MapReduce job usually splits the input dataset into independent chunks which are processed by the map tasks in a parallel manner, and then reduced back to a single data structure. 

The Hadoop framework is implemented in Java, and you can develop MapReduce applications in Java or any JVM-based language.

 

DOWNLOAD OUR FREE HADOOP TUTORIAL 

This three part tutorial on developing big data applications with Hadoop was developed for our staff, but we are happy to share it (sample code included). It will show you how to setup a Hadoop developer environment, and guide you through developing your first MapReduce job. 

Please let us know the email address we should be sending a PDF copy of the Hadoop tutorial to. A download link will be immediately emailed to you - please check your junk mail if you have a strong email filter.

White Paper Categories
0