Back to top

Precision Agriculture: Boosting Productivity with Big Data

How the UK Farming Industry Leverages Data Analytics to Modernise Agriculture

Ricardo Leon
Ricardo Leon
Functional & Industry Analytics Consultant

precision-agriculture-big-data.jpgA self-driving tractor is about to finish spraying 13.52 litres of plant growth regulators over 3.57 hectares of wheat crops in a farm in northern England. This operation’s figures are going to be recorded using a farm management software system. On average, a single smart tractor collects around 30 MB of data per day. Now, consider all tracked field operations applied to numerous crops across the UK every year, producing large amounts of valuable data stored in databases. Just awaiting to being processed and analysed.

How do we process and obtain information from all this data, which keeps growing in real time? With big data.

A properly designed big data solution would provide answers to a vast amount of interesting enquiries using historical data and more importantly it would lay the ground for predicting future trends.

Such a solution is being implemented by a UK-based company specialised in the global agriculture industry, in association with Search Technologies.

The Data Model

Raw data from farm management systems is extracted and fitted into a data domain model that has been meticulously designed to facilitate its manipulation and analysis. This process is called ETL: Extract, Transform, and Load.

The data model contains different entities, which individually encapsulate a farming concept (products, fields, farms, planting zones, crop types, field operations, etc.) and collectively integrate to describe agricultural activities.

The model revolves around field operations, such as drilling (planting), crop protection (pesticides application), nutrition (fertiliser application), and crop yield (harvest operations).

This data model is populated by millions of records in order to produce a data lake, holding valuable information ready for being queried and discovered.


Inbound data is not only loaded, it is also enriched in various ways, for example:

  • Data canonicalization: Farm management systems often allow the farmer to type free text, which generally creates multiple representations of the same entity. For example, ‘Abc,’ 'ABC +,’ and ‘ABC, 10’ refer to the same pesticide. All these entries will be rolled up into a single canonical form by using pattern matching with regular expressions.
  • Component breakdown: Fertilisers’ nutrients can be inferred from the name provided by the farmer. For example, ‘0-0-26-6’ can be broken down into Nitrogen (N) 0%, P2O5 (Phosphorus pentoxide) 0%, K2O (Potassium oxide) 26%, SO3 (Sulphur trioxide) 6%.
  • Organic soil composition: Based on a particular field's geographical location, soil properties can be obtained from external sources.
  • Weather conditions: Similarly, past weather conditions as well as forecasts for a particular location are obtained from an external Application Programming Interface (API).

Typically, soil types and weather conditions are not stored within farming systems, however, it can be deducted from each farm's geo-location (latitude and longitude). This information provides exciting new dimensions to the data lake. It opens the door to potentially answer questions like:

  • "What fungicide performs better under certain conditions across the UK?"
  • "Should farmers consider other options for next year’s cropping based on expected weather?"

Technology Stack

Processing and Storage


HPCC (High-Performance Computer Cluster), a data-intensive computing platform, has been chosen as the big data platform for this solution. This technology is used to obtain, process, analyse, and massage the data in order to build the desired data model.

HPCC has its own programming language called ECL (Enterprise Control Language) which was designed specifically for big data endeavours. ECL is used within THOR (The Data Refinery Cluster) - HPCC's processing component and distributed file system (and a very powerful one). THOR does the heavy lifting of big data very efficiently.

Querying and Searching

Searching and querying technical requirements are driven by emerging use cases as new customers are interested in consuming the data stored in HPCC.

Amongst actual and prospective customers, there are product suppliers, growers, cooperatives, regulation authorities, distributors, media outlets, etc., each with their own search needs, which need to be analysed on a case by case basis.

However, there are two general strategies currently implemented:


In order to search, browse, and slice and dice the data, Elasticsearch comes into play.

Subsets of the data lake are incrementally pushed from HPCC into Elasticsearch indices which can vary in content and structure, depending on the search needs of each case.

Scalability, speed, stability, and the Elastic Stack (Logstash, Kibana, X-Pack, etc.) make Elasticsearch a perfect fit for the solution architecture, providing search power and facilitating data analytics exercises.

ROXIE (Rapid Online XML Inquiry Engine)

ROXIE is HPCC's own data delivery engine. It is used to expose structured and punctual results. Although designing and implementing the queries can be time-consuming, it serves results very quickly.

Architecture Overview


Use Cases


The precision agriculture data can be offered to customers for external consumption in a raw-ish fashion. This implies that the data lake is going to be partially or totally indexed (depending on the customers' interests) into Elasticsearch and then exposed through APIs, controlled by query parameters.

Data Analytics and Data Science

The project has a dedicated Data Science team which is constantly examining the data lake in order to research more complex business enquiries. For example, broader questions such as "Why was 2016 an awful year in terms of Oil Seed Rape Yield? What correlates to higher yield?"

For this particular scenario, by analysing the data using HPCC in conjunction with statistical software such as R, the Data Science team is able to create a statistical model in order to narrow down which specific variables amongst a discrete set (for example, radiation, soil content, geographical location, fertiliser treatment, temperatures, etc.) correlate to higher yield.

This particular piece of information was requested by editors from a British magazine aimed at the farming industry as they were looking to publish the outcome of that investigation in an upcoming report.

The answer to questions like this one and many others can be of paramount importance for growers, manufacturers, providers, and even regulation authorities, amongst others. You can rightly argue that they already have access to some of this information, but its consolidation, wholeness, and understanding might be tricky.

That’s the beauty of big data; it lets you see the big picture as you have never seen it before.

What’s Next?

The roadmap is quite promising. This project has all the ingredients to evolve rapidly as new data sources are included and potential customers’ attention is drawn.

Including other farming systems and incorporating livestock are amongst the short term goals, which would increase data coverage significantly and open a vast array of new possibilities.

As it stands, the project is in a favourable position to serve data to different actors in the industry always keeping in mind a single goal: boosting farming productivity. Farmers, manufacturers, buyers, retailers, consumers, and pretty much anyone involved in the agricultural chain will ultimately benefit from applying big data to farming.

From Traditional Agriculture to Farming with Big Data

Agriculture has molded human history for centuries, since early human beings first domesticated plants and animals, and suddenly came across a new sustainable activity that encouraged them to settle, develop tools, and master techniques. A lifestyle was born. A journey towards progress began.

As H.G. Wells wrote in his novel, "The World Set Free" (1914):

"Unpremeditated, undesired, out of the accumulations of his tilling came civilisation. Civilisation was the agricultural surplus."

Arguably, information is civilisation's surplus nowadays.

Now we have the opportunity to tie them together in order to ignite a constructive cycle from which we all can benefit.

And that's exactly what we are doing.

Have questions or want to get an in-depth look at how this precision agriculture platform works? Please email

- Ricardo