Back to top

Vespa: Open Source Big Data Analytics Welcomes a New Entrant

vespa big data analytics platform overviewWith rapid changes and constant innovation in our search and big data analytics space, we keep a close eye on where the market is heading and what’s new. Earlier this year, we considered the rise of open source and the convergence of search and big data two of the main driving forces in our space. This was reinforced later in the year when Oath, a part of Verizon, announced in September that the company would open source Vespa – its powerful but largely unknown big data processing and service engine (coincidentally named the same as the iconic Italian scooter). You can download and test it out on GitHub.

There’s an interesting bit about Oath’s history if you haven’t already known. Oath, a Verizon company, was formed via the acquisitions of AOL and Yahoo, with veteran technologists behind it. It was under the Yahoo brand that Vespa was developed and has long been leveraged across “, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr, and others – to process and serve billions of daily requests over billions of documents while responding to search queries, making recommendations, and providing personalized content and advertisements” (Oath). Another interesting fact: Did you know that Hadoop’s co-founder, Doug Cutting, was leading the Hadoop project while working at Yahoo?

So, with a myriad of big data analytics platforms and tools in today’s market, what value can Vespa add? Here are our initial observations:

1.  One of your questions may be “How is Vespa different from the tools within the Apache ecosystem, namely Hadoop?” As Oath stated, “while developers can use the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users.” But Vespa does. Thus, it may be considered a complementary technology to Hadoop: Hadoop processes your massive data while Vespa enhances the end-user experience by solving the scalability and speed challenge of serving real-time results. 

2.  “Now Vespa sounds like a search engine” – you may think. That’s true because after all, every big data application should have a good search engine supporting it. But by positioning Vespa as a big data analytics platform rather than a pure search engine, Oath indicated its potential to extend beyond search. And this leads to our next point…

3.  In the evolving world of “insight engines” or “cognitive search,” Vespa, as a big data analytics platform, can find its place in the modern enterprise where there’s a need to go beyond traditional search across massive content to achieve intelligent responses like recommendations and personalization. In this perspective, Vespa could be a powerful addition to open source and commercial big data analytics technology stacks. Its ability to quickly scale, process, and serve the right content to end-users can apply to many use cases which we’ve experienced, such as recommendation engines, recruiting search and match, precision medicine, precision agriculture, and even more complex NLP-powered applications like chatbots or digital assistants. 

4.  “OK, but how does Vespa compare to other technologies that do comparable tasks?” Commercial alternatives include similar products like IBM Streams and Pachyderm. In the open source space, candidates like Elasticseach and Solr come to mind. If you want a quick glance, Oath provides a comparison chart between Vespa and Elasticsearch or relational databases. But remember, these charts and features are simply guidelines; we always tell our clients that there’s no “best” platform – it all depends on your use case, industry, systems, and user requirements. Many times, a custom stack of tools is the most practical solution. And as an open source, Vespa has the potential to be explored and developed to support more novel use cases.

Still in its early days in big data analytics, whether Vespa will be widely-leveraged in the enterprise is still uncertain. But its focus on end-users plus its experience with the dynamic Yahoo ecosystem can make Vespa a capable new entrant among the existing open source big data players, transforming search and content analytics for both enterprise users and end consumers.