Back to top

Pushing the Limits of Scalability

Let's talk about pushing the limits of search engine scalability — especially for very large databases.

There is really only one dimension of size: The total count of documents in the system. Other dimensions of size (very large queries, very large documents, etc.) typically do not significantly affect performance or scalability.

I would say that 95% of systems are small — meaning that they have less than 2 million documents. Such systems can be easily handled with a single machine.

Next up are medium systems, those in the 10 million to 100 million document range. If these systems have any kind of query or index performance requirements — for example, it is a public web site with 10-30 queries per second, or that new documents arrive at a rate of 10 documents per second — then you will likely need an array of machines to handle your needs. Search Technologies has installed and integrated dozens of such systems, and they typically require anywhere from five to 20 machines in a search cluster.

Finally, we have large and very large systems. Large would be 500 million to 1 billion documents, and very large would be 2 billion to 10 billion documents or more.

Search Technologies has experience with extremely large database sizes. We've architected systems approaching 1 billion documents, and expect to be involved with systems with 10 billion to 20 billion documents. Such systems require large data centers with many servers, but are do-able with today's technology, even by organizations of modest means.

Architecting a search system for such enormous amounts of data requires creating replicable, modular, search "units" that can be scaled hierarchically, and it also requires paying attention to both ends of the spectrum — the performance of a single machine, as well as how that machine interacts with others at all levels.

To architect a system for 10 billion documents, you must approach the problem from bottom to top, as follows.

First: Pack as many documents onto a single node as possible

Most search engines recommend a maximum of 10 million documents on a single computer. For a 10 billion document database, that would require 1,000 computers! Clearly, that approach will not be feasible for most organizations.

However, if configured and tested properly, this number can be dramatically increased. This requires 1) creating multiple "virtual search nodes" on a single physical computer, and 2) creating "read only" search nodes — which are only for search and do not perform real-time indexing.

One of the secrets of text search is that indexing is enormously expensive. If you can carefully control the indexing process such that only a single indexer is running on a node at a time, then you can pack a much larger number of documents for search onto a single search node.

Even better, if you are able to perform indexing off-line in large batches, not only will indexing be much more efficient, but then your entire search node can be used for searching.

Using these techniques, one can increase the number of documents per node from 10 million to at least 50 million, and possibly 100 million (depending on the amount of RAM, cores, and query features required). Such increases dramatically reduce the number of computers required, from 1,000 computers down to 100 to 200 computers, a much more manageable number.

Second: Create Clusters

The next breakpoint in scalability will be encountered when you get to around 50 to 100 computers in a search cluster.

Each search cluster has a process that distributes a search to multiple machines, each of which searches its part of the database in parallel, and then merges the results. As you get to a very large number of search nodes (and remember, each machine may have multiple, virtual search nodes), one begins to encounter scalability issues with these "distribution and results merging" processes.

Usually, you want to keep the number of virtual search nodes to less than 256 (or so). Some search engine systems allow for multiple layers of results distribution and merging, and these should be leveraged where possible. For example, the first layer can distribute and merge results across 16 nodes, and then the second layer can distribute and merge results across 16 groups, giving a total of 16x16 or 256 nodes.

Third: Test for Reliability

Disaster recovery testing is critical for creating large, reliable clusters. One cannot depend on vendor guidelines; failure modes must be anticipated, tested, and recovery plans documented.

A key method for improving reliability is to use Storage Area Networks. SAN can be easily reconfigured and targeted toward multiple machines, and contain built-in disk reliability with RAID 5 or RAID 6. Note that, since the large majority of our search nodes will be read-only (see above), any performance penalty for disk writes in RAID configurations will further be minimized.

Search Technologies has extensively tested SAN storage and text search performance. When configured properly, the two technologies work very well together. Using proper amounts of RAM and search index caching, the difference between using SAN storage and Local, Direct Attached Storage (LDAS) can be as low as 3-5% for most text search applications.

Using SAN allows for hardware failures to be quickly overcome. Since the indexes are all on SAN, quick configuration changes are all that's required to replace a failed server node. Pre-configuring off-line nodes can further improve the speed of failure recovery so they can be very quickly swapped into place.

Fourth: Federation

The above configuration allows for a maximum database size of around 2.5 billion documents. Very large indeed.

But to get to 10 billion documents and above, one will need to replicate multiple 2.5 billion document clusters, in a federated environment.

Most search systems contain some element of federation, either built into the product, or as an add-on feature. Federation allows for a search to be sent to multiple search engine clusters — each of which will search in parallel — and then the results are merged when presented to the user. Depending on how the documents are distributed amongst the clusters, the federator may wait for all searches to complete or it may present the results as they are completed.

The primary purpose of federation is to allow for unlimited growth in database size by using multiple search clusters with familiar performance and reliability characteristics. Further, it limits the impact of cluster failure. If any single cluster fails, the other clusters are untouched and can continue to return search results, providing a graceful degradation in performance even in the face of severe hardware problems.

Conclusions

I hope this discussion has been illuminating. Just a few years ago, I would never have considered that such large search databases could be created with off-the-shelf components.

But software, hardware, storage, and search technologies all have improved to the point that search engines for large databases can be and are being created within predictable cost and schedule estimates.

--- Paul

0