Aspire + Azure Cognitive Services: Transforming Unstructured Data Preparation for Azure Search
Methodology, Reference Architecture, and Demo
Amid the rise of cloud-based search-as-a-service platforms in recent years, Azure Search has emerged as a major player. Responding to businesses’ demand for faster, better insights from ever-growing data, cloud-based search platforms provide a scalable, flexible infrastructure for increasing speed-to-value. But for any search solution, cloud or on-premises, good insight discovery always starts with good data preparation. Organizations seeking to leverage Azure Search first need to find answers to data questions, such as:
- How do I get my content out of SharePoint (and other disparate sources) and process it for search?
- How do I enrich content, especially unstructured files, before indexing it into Azure Search?
- How do I write content to Azure Search?
- How do I extract entities, such as companies and people, from my unstructured full-text files?
- How can I ensure my users have access to only the documents they have permission to view?
Is there an innovative approach to address these data preparation inquiries? There is. By integrating Accenture’s Aspire with Microsoft Azure Cognitive Services, we can accelerate unstructured content acquisition, and significantly enhance content processing and enrichment in Azure Search.
Wondering how we got there? Read on to learn more about Aspire, Azure Cognitive Services, and how we integrated these technologies to improve search. A demo is also included.
Aspire Content Processing Framework and Azure Cognitive Services
What capabilities do Aspire and Cognitive Services bring to ensure disparate content is efficiently acquired, processed, and enriched prior to indexing into Azure Search?
Aspire is a search engine independent content processing framework designed specifically for unstructured and semi-structured data. The framework supports complex content enrichment, contains a staging repository for efficient indexing, enables document-level security, and provides connectors for acquiring data from multiple content sources.
Azure Cognitive Services are intelligent on-demand cloud services that can be connected to websites/applications via APIs and customized for business requirements. These services incorporate AI to better understand and cater to users’ needs, helping them use data to solve business problems faster.
Let’s look at the benefits of integrating Aspire and Azure Cognitive Services in the following example scenario. We’ll then dive into the methodology of the entire integration process.
Increasing Business Value by Improving Search: An Example Scenario
Contoso Financial (a fictional organization) provides investment recommendations to investors. As part of this work, the advisors need to sift through large amounts of many companies' financial content. However, challenges arise when much of this content is stored in unstructured files, such as annual reports, financial disclosures, press releases, and internal research both on-premises and in the cloud.
Finding relevant information to support research can be a tedious process, let alone the effort it takes to find insights from this content. Using Aspire connectors, we can acquire companies' financial information from disparate sources. Then, by combining Aspire’s content processing capabilities with Azure Cognitive Services, we can enrich the content with important entities, such as the people and organizations mentioned in the financial content.
The processed content is then indexed into Azure Search via an Aspire publisher, allowing Contoso's advisors to effectively search, explore, and derive insights from the content. When financial advisors have relevant information, they can deliver better value to their customers by providing more informed, well-researched recommendations.
Reference Architecture for Integrating Aspire Content Processing and Azure Cognitive Search
Below is our architectural diagram outlining the methodology and components involved, from content acquisition, processing, enrichment, and indexing, to results presentation.
1. Ingesting unstructured content
Unstructured documents, such as PDFs, Office docs, images, can be ingested and processed in Aspire. Regardless of the original content source (SharePoint, File Systems, or any other systems), content can be acquired efficiently and securely using Aspire connectors, which support over 27 unstructured content sources. Take the Contoso scenario above, for example, companies' annual EDGAR 10-K reports can be ingested with an Aspire connector that fetches content directly from an Azure Blob container.
2. Configuring Aspire to process content and extract text
The acquired content is first processed in Aspire, which extracts full text from the annual EDGAR 10-K reports.
3. Integrating Aspire with Azure Cognitive Services for Text Analytics
The processed content in Aspire is then sent to Azure Cognitive Services via REST APIs for further enhancements. Azure Text Analytics – a part of Cognitive Services – is used for:
- Extracting entities within the EDGAR 10-K documents and providing links to Wikipedia entries and Bing IDs to make further distinctions between people, locations, and organizations
- Extracting keyphrases
In addition to the features above, Azure Text Analytics also provides language detection and sentiment analysis, which can work with text extraction to help enable sophisticated search and visualization.
4. Indexing the enriched content into Azure Search
Once the content is fully processed and enriched, it is indexed into Azure Search via Aspire’s Azure Search publisher and made available to end-users via a search UI. Users can access and analyze the content of the documents via search and visualization.
5. Displaying and visualizing search results
An Azure Search UI can be built depending on the organization’s requirements. In this example scenario, the search results can be filtered by the facets (for example, people, organizations, and locations) identified during entity extraction. Based on the user’s search criteria, the application can produce corresponding visualizations, such as graphs.
In our example UI below, the graph displaying the organizations associated with the company "BioMarin" includes "Daiichi Sankyo" and "Asubio Pharma Co., Ltd." - two organizations that are highly correlated with BioMarin Pharmaceutical. This correlation can provide Contoso's financial advisors with valuable insights that support their investment recommendations.
For organizations requiring specific access restrictions, document-level security techniques can be applied during content processing to ensure users can only find and access the content intended for them. A security filter can be integrated with the .NET web application to help identify the appropriate user/group memberships for each piece of content.
Powering More Intelligent Search
Combining Azure Cognitive Services with Aspire’s sophisticated content processing capabilities provides a high-performing data preparation pipeline for Azure Cognitive Search. As a result, this will help improve data acquisition and enrichment, accelerate information discovery, and ultimately increase business value.
To see how the enriched data from Aspire and Azure Cognitive Services are presented in Azure Search, follow the instructions below to check out our demo.
Please note that this demo is not monitored and may not be up 100% of the time.
- Visit our demo site (Google Chrome currently provides the best viewing experience)
- In the search box, enter a company name, such as "BioMarin," "Alcoa," "Accenture," "Walmart," etc.
- For each company query,
- The search results will display any EDGAR 10-K documents related to the company. Each resulted record contains the text and entities extracted during content processing.
- The faceted categories will display the People, Organizations, Locations, and Key Phrases that are found within the company's indexed EDGAR 10-K documents.
- To visually explore the relationships between the company and any related people, organizations, locations, and key phrases, click on the "Explore Graph Relationships" link right above the search box. Then, use the drop-down list next to the search box to specify organizations, people, locations, or key phrases.
- To go back to the search results view from the graph view, click on the "Back to Document Search" link right below the search box.
To learn more about how Aspire and Azure Cognitive Services can be combined to make search smarter, connect with us.
This blog and demo were developed by:
- Paul Nelson, Innovation Lead, Accenture Applied Intelligence
- Eduardo Quirós-Campos, Functional & Industry Analytics Sr. Manager, Accenture Applied Intelligence
- Arturo Vargas, Functional & Industry Analytics Analyst, Accenture Applied Intelligence
- Maynor Alvarado, Functional & Industry Analytics Sr. Manager, Accenture Applied Intelligence
- Liam Cavanagh, Principal Program Manager, Azure Search, Microsoft
- Anupam Sharma, Sr. Technical Program Manager, Microsoft