Building Search, Analytics, and BI Applications with Data from the Internet
“Cruising the Data Ocean” Blog Series - Part 5 of 6
This blog is a part of our Chief Architect's "Cruising the Data Ocean" series. It offers a deep-dive into some essential data mining tools and techniques for harvesting content from the Internet and turning it into significant business insights.
In my previous posts, I provided the tools and techniques for selecting, extracting, cleansing, and understanding content from the Internet in order to support your business use case. In this blog, I'll discuss how to use the processed data for your own custom search, analytics, and business intelligence (BI) applications.
The output from Internet-downloaded content usually comes in the form of fairly structured data:
- Metadata fields and values - from tagged content, entity extraction, or Natural Language Processing (NLP) extraction
- Categories and tags - from statistical tagging and/or clustering systems
- Table rows - from NLP extraction of facts
- Object-relationship triples - from NLP relationship extraction
Your Internet Data Has Been Cleansed and Processed. What's Next?
There are several places this information can go to:
- A search engine - to enhance the full document (additional metadata fields for additional facets or filters) and to support search-based visualization dashboards (Kibana, Banana, Hue, or ZoomData, for example)
- A relational database - to be combined with other business data for visualization and business analytics (Tableau, Pentaho, or others)
- A graph database - for complex relationship analysis
- A monitoring and alerting tool - for situations that need immediate attention (e.g. compliance violations, trending negative sentiment, bad customer service situations, etc.)
- Apache Spark - for further real-time analytics and machine learning
- A business rules engine / ESB / workflow - to send the output through further manual and business processing. For example, to review the output for quality, check for compliance violations, etc.
- Custom applications - for quality review and analysis, crowdsourcing review, etc.
Examples of Search and Analytics Applications Using Content from the Internet
Actual end-user applications will be on top of these destinations. Some examples of end-user applications include:
- Trend reports on activity, market share, or sentiment over time
- A standard search engine interface - extracted metadata or tags can be represented as new facets and filters, or inside of advanced search pages
- An exploratory relationship viewer - find people, places, or things, and then look at how they are related to other people, places, or things
- A report / summary of information about each of your customers and their needs / top-of-mind initiatives with references
- E-mail alerts on detected customer problems and complaints on social media or strongly negative trends
- Searchable databases of entities, rules, regulations, or similar
- Augmented text documents with embedded links to external / enriched information
- Word clouds about trending topics
Now that you've built your search, analytics, and BI application(s) using a combination of web data mining tools and techniques, continuously doing quality analysis and improvement for your application is key to sustained performance. I'll discuss quality analysis in my next post.