Aspire 3.2 Big Data Release: Big Data Integration as a Cloudera Parcel for Unlimited Scalability
Through hundreds of search and analytics projects over the last decade, we have seen a number of technology challenges, specifically with unstructured content, that available commercial or open source platforms cannot solve. To address these issues, we have developed Aspire – a search engine independent content processing framework designed for handling unstructured data, providing a powerful solution for connectivity, cleansing, normalization, analysis, and publishing of human-generated content to search engines and big data applications. It is part of our collection of technology assets that help clients optimize search and big data analytics performance.
In March 2018, we are pleased to announce the Aspire 3.2 Big Data Release – the first Aspire release since Search Technologies became part of Accenture! Aspire 3.2 Big Data delivers some innovative advancements that are worth noting.
A key highlight is Aspire’s successful integration with the Hadoop ecosystem as a Cloudera parcel. While Aspire is still available as a standalone component, Aspire as a Cloudera parcel delivers unlimited scalability for big data projects. This big data integration has enabled our clients to do some pretty amazing search and analytics tasks over massive amounts of unstructured data across their organizations!
In our recent work for a large pharmaceutical client, we have demonstrated daily ingestion rates of up to 50 Terabytes of content into HDFS. If you haven’t yet, read about how we used Aspire as a Cloudera parcel to help a large biopharmaceutical client ingest over 1 Petabytes of content into their data lake.
Another important new feature is the use of HBase along with MongoDB for holding the “crawl state” (Aspire’s core functionality). This version requires MongoDB or HBase to be installed along with Aspire depending on the client’s environment. Leveraging HBase in Aspire 3.2 is important because:
- HBase provides unlimited scalability.
- HBase is typically available in big data implementations. Our Cloudera parcel / big data release uses a big data database, making installation and maintenance simpler in enterprise big data environments. In other words, using HBase makes Aspire more compatible with the big data ecosystem.
Other new features available with Aspire 3.2 release include:
- User roles – enabling separate roles for configuring and developing new content sources vs. maintaining and monitoring existing content sources. This helps improve security and reliability for production installations.
- Licensing – our model makes it easy to scale in the world of big data. Connect with us for more details.
- Administrator audit logs
- Automatic retries and concurrency improvements
- Resume after failed or stopped crawls
- SMB2 support for CIFS
- Synchronization updates
- Connector framework’s error handling improvements
- Automatic backup after configuration changes are made
Also very exciting is our plan for the next Aspire release – the integration of Aspire for Big Data into the Accenture Insights Platform (AIP). AIP is our comprehensive and scalable solution that allows organizations to get actionable insights and business outcomes, quickly, with a competitive flexible commercial model. Together, AIP and Aspire bring a powerful technology stack that modernizes the acquisition, enrichment, analysis, and visualization of unstructured and structured data. Stay tuned for our upcoming Aspire updates!