Back to top

Prepping for the Future: Automation of Knowledge Work

The Convergence of Search, Big Data, and Machine Learning

Ahmad Jassat
Ahmad Jassat
Functional & Industry Analytics Manager

Most people don't consider a search engine artificial intelligence, but it is interesting to note how our definition of intelligence is shifting as technology advances. A few years ago, no one would have doubted that getting driving directions would have required a great level of intelligence to determine a route between two points on a map, never mind the best route given traffic and roadworks. And how about language translation? But now, all of these things are so routine that we just defer to Google without even considering it as intelligence. 

We have all gotten very used to leveraging the output of search engines to help us perform our knowledge-intensive work. We search for something and make decisions to perform actions based on the results we get back, often exclusively on those results. The next step that is about to happen with search technologies may literally be that; the search technology will be performing the next step. In the same way, as information systems have improved the quality of knowledge work, the next generation of information systems will start to do the low-level knowledge work, allowing us to focus on higher-level tasks.  

Pie in the sky? Well, in a 2013 report from the McKinsey Global Institute, automation of knowledge work was identified as one of the top disruptive technologies that will have a potential annual economic impact of over $5 trillion by 2025. That is much greater than what they expect the economic impact of renewable energy to be!

mckinsey gallery of disruptive technologies

Of course, this will not happen overnight, and the technology to achieve this is far from perfect. However, given the exponentially increasing cost of higher-level knowledge workers, it would certainly pay to start leveraging automation. Actually, there has been relatively little hype surrounding automation of knowledge work compared to the other technologies in the McKinsey list, as illustrated in this Google Trends analysis. 

trends in knowledge work automation

This trend suggests that there is a great opportunity for companies who invest in this area to get ahead of the curve. 


What is Currently Possible?

Recent studies have suggested that computers are able to grade essays almost as well as humans. While I don’t think machines will be fully responsible for grading exam papers anytime soon, it’s not difficult to see how this kind of technology can support labor-intensive knowledge work, such us grading thousands of exam papers or sifting through CV's by categorizing and filtering large volumes of data in useful ways, while reducing risk and improving efficiency and productivity.

In fact, Search Technologies already employs machine learning and big data to help customers match jobs to CV's with a significant measurable impact on the rate of successful recruitment. We've also used the same underlying technologies to improve fraud detection systems for insurance companies. Considering the wide diversity of so many possible use cases, it's clear as to why McKinsey Global Institute would predict a $5 trillion impact by 2025. 


An Architecture for Automation of Knowledge Work

automation of knowledge work architecture

How It Works

The maturity and convergence of a number of areas in IT are allowing for the construction of automated knowledge systems. At the heart of it, these technologies enable the extraction of structured metadata (meaning) from vast volumes of unstructured content. Together with large enough set of expert judgments on that content, these are used to generate and evolve predictive models by applying clever mathematics called "machine learning."

So in the case of the essays being graded, the machine does not have any understanding of the words. But once it identifies the right real world entities within those words, it's able to generate a formula to predict the score of any particular input stream based on the patterns found in previously graded content involving those entities. Of course, all of this requires some pretty innovative technology. Perhaps you might recognize some of the players in this space from the search engine world due to the large overlap between these fields. 


Technology Stack

Data connectors 

Data connectors are required to collect data and metadata from the plethora of source repositories that are out there from fileshares to databases, ERP, CMS, Scientific Logbooks, and even social media. These are often bundled with search engine products but also search engine independent (e.g. our Aspire connectors range). Open source alternatives also exist. For the purposes of knowledge work automation, we also need to index vast amounts of log information with specialist log shippers like Elastic’s Logstash and Beats. 


Content processing pipeline

A content processing pipeline is needed to perform cleaning, normalizing, entity extraction, and metadata enrichment. At Search Technologies, we have developed our own Aspire content processing framework but we also work with all the major open source (e.g.  Apache Manifold) and proprietary (e.g. SharePoint Content Processing, Autonomy IDOL, Convera) offerings.


Big data platforms

Big data platforms enable large scale operations on extremely high volumes of data. We've built applications on major Hadoop-based big data platforms, including Cloudera and Hortonworks, as well as using cloud big data offerings like Amazon Web Services (AWS) and Microsoft Azure.


Search engines 

Search engines allow disparate data sources to be connected by natural language constructs. And having worked with them for over a decade, we have very deep and wide expertise with all the major open source and proprietary search engines, including Lucene/Solr, Elasticsearch, FAST for SharePoint, the Google Search Appliance, and many others.


Machine learning platforms

Machine learning platforms construct the crucial predictive models and are available as both software libraries (e.g. Apache Mahout, R) and SaaS services (e.g. Amazon Machine Learning Platform, Microsoft Azure Machine Learning, Google Prediction API). 


5 Steps to Automation of Knowledge Work 

knowledge work automation process

  • Step 1  - Identify costly, labor-intensive, and highly repetitive tasks performed by knowledge workers. 
  • Step 2 - Feed a machine learning system with historical data inputs and human activity outputs to produce predictive models (we have worked to develop various predictive algorithms for many enterprise use cases).
  • Step 3 - Apply predictive models to new data to automate knowledge worker tasks, in parallel to normal operations.
  • Step 4 - Evaluate results and fine tune predictive models to gain confidence by measuring the quality of machine output compared to that of manual effort.
  • Step 5 - Build applications that augment human knowledge resources with automated machine knowledge work. 

If this model works well, it would allow companies to divert resources towards high-level work and cash in on the cost savings, reduced stress, and a more skilled workforce. 

-- Ahmad