Pushing the limits of Scalability
Let's talk about pushing
the limits of search engine scalability — especially for very large
databases.
There is really only one dimension of size: The total count of
documents in the system. Other dimensions of size (very large queries, very
large documents, etc.) typically do not significantly affect performance or
scalability.
I would say that 95% of systems are small — meaning that
they have less than 2 million documents. Such systems can be easily handled with
a single machine.
Next up are medium systems, those in the 10 million to
100 million document range. If these systems have any kind of query or index
performance requirements — for example, it is a public web site with 10-30
queries per second, or that new documents arrive at a rate of 10 documents per
second — then you will likely need an array of machines to handle your needs.
Search Technologies has installed and integrated dozens of such systems, and
they typically require anywhere from five to 20 machines in a search
cluster.
Finally, we have large and very large systems. Large would be
500 million to 1 billion documents, and very large would be 2 billion to 10
billion documents or more.
Search Technologies has experience with
extremely large database sizes. We've architected systems approaching 1 billion
documents, and expect to be involved with systems with 10 billion to 20 billion
documents. Such systems require large data centers with many servers, but are
do-able with today's technology, even by organizations of modest
means.
Architecting a search system for such enormous amounts of data
requires creating replicable, modular, search "units" that can be scaled
hierarchically, and it also requires paying attention to both ends of the
spectrum — the performance of a single machine, as well as how that machine
interacts with others at all levels.
To architect a system for 10 billion
documents, you must approach the problem from bottom to top, as
follows.
First: Pack as many documents onto a single node as
possible
Most search engines recommend a maximum of 10 million
documents on a single computer. For a 10 billion document database, that would
require 1,000 computers! Clearly, that approach will not be feasible for most
organizations.
However, if configured and tested properly, this number
can be dramatically increased. This requires 1) creating multiple "virtual
search nodes" on a single physical computer, and 2) creating "read only" search
nodes — which are only for search and do not perform real-time
indexing.
One of the secrets of text search is that indexing is
enormously expensive. If you can carefully control the indexing process such
that only a single indexer is running on a node at a time, then you can pack a
much larger number of documents for search onto a single search
node.
Even better, if you are able to perform indexing off-line in large
batches, not only will indexing be much more efficient, but then your entire
search node can be used for searching.
Using these techniques, one can
increase the number of documents per node from 10 million to at least 50
million, and possibly 100 million (depending on the amount of RAM, cores, and
query features required). Such increases dramatically reduce the number of
computers required, from 1,000 computers down to 100 to 200 computers, a much
more manageable number.
Second: Create
clusters
The next breakpoint in scalability will be encountered
when you get to around 50 to 100 computers in a search cluster.
Each
search cluster has a process that distributes a search to multiple machines,
each of which searches its part of the database in parallel, and then merges the
results. As you get to a very large number of search nodes (and remember, each
machine may have multiple, virtual search nodes), one begins to encounter
scalability issues with these "distribution and results merging"
processes.
Usually, you want to keep the number of virtual search nodes
to less than 256 (or so). Some search engine systems allow for multiple layers
of results distribution and merging, and these should be leveraged where
possible. For example, the first layer can distribute and merge results across
16 nodes, and then the second layer can distribute and merge results across 16
groups, giving a total of 16x16 or 256 nodes.
Third: Test for
Reliability
Disaster recovery testing is critical for creating
large, reliable clusters. One cannot depend on vendor guidelines; failure modes
must be anticipated, tested, and recovery plans documented.
A key method
for improving reliability is to use Storage Area Networks. SAN can be easily
reconfigured and targeted toward multiple machines, and contain built-in disk
reliability with RAID 5 or RAID 6. Note that, since the large majority of our
search nodes will be read-only (see above), any performance penalty for disk
writes in RAID configurations will further be minimized.
Search
Technologies has extensively tested SAN storage and text search performance.
When configured properly, the two technologies work very well together. Using
proper amounts of RAM and search index caching, the difference between using SAN
storage and Local, Direct Attached Storage (LDAS) can be as low as 3-5% for most
text search applications.
Using SAN allows for hardware failures to be
quickly overcome. Since the indexes are all on SAN, quick configuration changes
are all that's required to replace a failed server node. Pre-configuring
off-line nodes can further improve the speed of failure recovery so they can be
very quickly swapped into place.
Fourth:
Federation
The above configuration allows for a maximum database
size of around 2.5 billion documents. Very large indeed.
But to get to 10
billion documents and above, one will need to replicate multiple 2.5 billion
document clusters, in a federated environment.
Most search systems
contain some element of federation, either built into the product, or as an
add-on feature. Federation allows for a search to be sent to multiple search
engine clusters — each of which will search in parallel — and then the results
are merged when presented to the user. Depending on how the documents are
distributed amongst the clusters, the federator may wait for all searches to
complete or it may present the results as they are completed.
The primary
purpose of federation is to allow for unlimited growth in database size by using
multiple search clusters with familiar performance and reliability
characteristics. Further, it limits the impact of cluster failure. If any single
cluster fails, the other clusters are untouched and can continue to return
search results, providing a graceful degradation in performance even in the face
of severe hardware problems.
Conclusions
I
hope this discussion has been illuminating. Just a few years ago, I would never
have considered that such large search databases could be created with
off-the-shelf components.
But software, hardware, storage, and search
technologies all have improved to the point that search engines for large
databases can be and are being created within predictable cost and schedule
estimates.
--- Paul