Back to top

Unstructured Big Data Case Examples

Content ETL is an important part of many search and analysis applications. A few examples follow.

Patent Normalization and Cross Linking
Search Technologies used Aspire to analyse and interconnect more than 80 million patent records. The new linking infrastructure enables additional search functionality, such as limiting search to within specific patent families, and link citation navigation.

This application also normalizes company names (IBM, International Business Machines, etc.) to ensure that both search and analyses provide comprehensive results.


Whole Document Comparison
The accuracy of document comparison algorithms is highly dependent on metadata and vocabulary normalization. In a recent example, Search Technologies worked with a customer to compare new job descriptions against a database containing millions of CVs, and automatically choose only the most relevant candidates for each vacancy.

Content ETL for this application uses a combination of semantic and statistical methods. The end result is a highly accurate service, saving time for recruitment professionals, by returning only the most relevant of the available candidates.


Popularity Applications
This class of big data application analyses log files to create insight into the relative popularity of products, documents or other items. The insight generated can be used, for example, to promote documents within search results based on their "stickiness" in response to previous searches. Generally, it is not enough simply to count click-through rates. Doing that in isolation tends to create self-fulfilling prophesies. Instead, a more holistic analysis of visitor behaviour can provide a measure of the match between the original search clue and a document.

Recommendation Applications
People who bought this, also bought that. Thanks to Amazon and other online retailers, this is a familiar concept to most people.

Recommendation systems are built on analysis of the behaviour of previous users. This can be done at three levels:

  • Individual: What did this particular person previously buy or browse?
  • Peer Group: What did like-minded individuals do? Groups can be formed based on behaviour patterns, or they can be more explicit. For example, members of the same department within an organization.
  • General: What did the community as a whole do?

      Recommendations are not limited to e-Commerce. we've worked with customer to take a similar approach to recommending TV listings (for a large Cable TV company), and recommending experts for projects within a services organization.