Azure Blob Publisher for Better “Ground to Cloud” Content Processing and Enrichment
Leveraging Accenture Aspire Content Processing, Azure Cognitive Services, and Blob Storage
With increasing business demands for faster, better insights, cloud-based search platforms have emerged as scalable, flexible solutions that can increase speed-to-value. Microsoft Azure Search is among the leading cloud search solutions we’ve helped clients implement. But for any search application, cloud or on-premises, good insight discovery always starts with good data preparation. How can we help our clients acquire, process, and prepare structured and unstructured content to deliver an enhanced Azure Search experience?
In this blog, we’ll discuss how combining Accenture technology assets and Microsoft’s robust range of Azure Cloud services can accelerate and enrich content processing for a better search and analytics experience.
Azure On-Demand Cloud Services
Azure Blob Storage is an Azure Cloud service for storing large amounts of unstructured object data, such as text or binary data. Blob Storage can expose data publicly or store application data privately. Common uses of Blob Storage include:
- Serving images or documents directly to a browser
- Storing files for distributed access
- Streaming video and audio content
- Storing data for backup, disaster recovery, and archiving
- Storing data for analysis by an on-premises or Azure-hosted service
Also powered by the Azure Cloud platform, Azure Cognitive Services offer a range of scalable cloud-based content processing services. These intelligent, AI-powered on-demand cloud services can be connected to websites and applications via APIs and customized for business requirements.
Accenture Technology Assets
Accenture Aspire Content Processing is a search engine independent framework designed specifically for unstructured and semi-structured data. The framework:
- supports sophisticated unstructured content enrichment,
- contains a staging repository for efficient indexing,
- enables document-level security,
- and provides over 30 search engine independent connectors and publishers to popular business repositories, enabling flexible and secure content acquisition.
Azure Blob Publisher to Enhance “Ground to Cloud” Content Processing
Combining Aspire for content acquisition and processing with Azure Blob for storage and Azure Cognitive Services for enhanced processing can provide efficient "ground to cloud" data preparation. A typical workflow is illustrated in the diagram below.
1. Aspire Content Processing can be installed on-premises to efficiently crawl and process unstructured content, such as images, documents, and anything available via HTTP links (e.g. web pages).
2. The content is published to Azure Blob cloud storage via Accenture Azure Blob publisher. Our Azure Blob publisher, built on the Aspire framework, took about two to three weeks to complete and features:
- Incremental updates to keep the content in Azure Blob in sync with the crawled content within Aspire
- A secure connection between Aspire and Azure Blob via a secret key
3. Once the content is in Azure Blob, it can be run through Azure Cognitive Services for additional enrichment with machine learning and natural language processing. For instance, common cleansing and enrichment tasks that can be done via Azure Cognitive Services include:
- Remove common content (sidebar, header, footer, boilerplates, etc.) across documents
- Automate content classification
- Extract references from the crawled content as metadata and then push to the search engine
- Merge multiple versions of the content
- Rank content based on popularity
- Extract people, locations, organizations, etc., as metadata and to enable filters on multiple facets
4. The enriched content can then be pushed to Azure Search for an enhanced search and analytics experience.
Partner Success Story
With Azure Blob, both on-premises and cloud applications can take advantage of all Azure Cloud services thanks to the ease of moving content between these services. This “ground to cloud” content processing solution allows for more efficient storage, processing, and enrichment, which ultimately enhances indexing, search, and analytics.
Watch Liam Cavanagh, Azure Search Principal Program Manager, discuss how Accenture Aspire works with Azure Cognitive Services to improve and add intelligence to unstructured content processing.
Contact us to learn more about how we can help enhance your content preparation for a better search experience.
Solution development was led by:
- Paul Nelson, Innovation Lead
- Bill Fowler, Functional & Industry Analytics Manager
- Cristina Arias, Functional & Industry Analytics Consultant
- David Solano Araya, Functional & Industry Analytics Consultant
We’re very excited to announce that we’re now part of Accenture! Read the announcement here.