A Search and Big Data Solution for Genomics Studies
A Scalable Analytics Platform for Improving Healthcare Plans and Personalizing Treatments
In today’s genomics studies, healthcare organizations and research institutes struggle with processing and extracting value out of the increasing deluge of genomics data triggered by the decreasing costs of genomics data sequencing.
As we do sequence our genomes, the scale of genomics data being collected is growing enormously. Healthcare organizations and research institutions are leveraging big data platform to:
- Ingest and store all of this information (read about how we helped a pharmaceutical customer ingest over 1 Petabyte of unstructured content into their data lake)
- Process and make the data available and easily accessible to end-users
- Enable researchers to derive personalized healthcare plans and treatments that improve current health and alleviate future health risks.
This process is referred to as Precision Medicine.
Recent efforts in Precision Medicine have focused around porting and refactoring existing tools to work in Hadoop-based environments and loading genomics data to tabular-based storage systems, such as Hive, Impala, and HDFS/Parquet, for analysis and visualizations.
Another similar approach for leveraging big data for genomics studies emphasizes the use of search engines as the primary storage mechanism. This is where we bring our specialized expertise in search and analytics to help customers build custom solutions for their research needs.
FINDING BETTER ANSWERS FROM THE DATA LAKE
Several research institutions for large hospitals we’ve recently worked with have created data lakes where they ingest clinical data from multiple repositories:
- EMR systems
- Flow cytometry applications
- DNA sequencing data for their patients
- DNA mutations/variations aggregated from multiple public databases
- Medical literature content
Often, the data in data lakes is not in a format readily available for easy aggregation and fast access from UI applications. Much of the raw data is in an unstructured form expressed as file formats unique to and accessible only from specialized bioinformatics tools.
Search and analytics tools can address this challenge and make data easily accessible to intended end-users (researchers, physicians, etc.), helping them analyze and collaborate more effectively.
Search engines are particularly suitable to help solve these challenges as they allow for substantial performance improvements as well as query facilities not supported by SQL-based engines, including the ability to provide:
- Faceted search
- Full-text search
- The execution of multi-dimensional genomics studies across many data sets
Open source search engines, like Elasticsearch and Solr, can be leveraged cost-effectively to build a custom solution for genomics studies. Other commercial search engines can also be used for this platform.
Advanced Content Processing
Data can be parsed and ingested in a format easily accessible from web applications. The design for the search engine indexes is based on intelligent chromosome data sharding techniques to enable substantial performance improvements for user queries.
End-User / Researchers’ Dashboards
On top of these search engine indexes, a research dashboard/application UI can be built that integrates clinical data, genomics data, and medical literature into a unified web-based interface that allows research institute users to perform cross-domain research studies and corroborate phenotypical with genomics data as well as visualize and analyze the data. Below is an example of such dashboard.
For our research institute customers, the robust, customizable research/dashboard application has allowed principal investigators to:
- Analyze and visualize the structured data
- Search over genome annotations data containing full-text
- Focus on discovering cures for diseases
- Ensure that research institutes can obtain funding more easily to pursue these cures
Contact us to learn more about how a custom solution built on search and big data can accelerate your genomics studies.