I'm curious how to plan a deployment where i have many concurrent searches. I understand how to account for indexing, but not sure how i can plan to support 100's of customers. What do i do if i want to support 25-50 users all searching at once?
Accommodating many simultaneous searches
A common question for a large deployment is: how do I account for many concurrent users? Let's take as an example a system that may have at peak times 48 concurrent searches. The short answer is that we can accommodate 48 simultaneous searches on a cluster of indexers and search heads where each machine has enough RAM to prevent swapping. Assuming that each search takes 200MB of RAM per system, that is roughly 10GB additional RAM (beyond indexing requirements). This is because CPU will degrade gracefully with more concurrent jobs but once the working set of memory for all processes exceeds the physical RAM, performance drops catastrophically with swapping.
The caveat here is that a search's run time will be longer in proportion to the number of free cores when no searches were running. For example, suppose the indexers were doing nothing before the searches arrived and have 8 cores each. Suppose the first (of identical searches) takes 10s to complete. Then the first 8 searches will each take 10s to complete since there is no contention. However, since there are only 8 cores, if there are 48 searches running, each search will take 48/8 = 6x longer than if only 1-8 searches were running. So now, every search takes ~1 minute to complete.
This leads to the observation that the most important thing to do here is add indexers. Indexers do the bulk of the work in search (reading data off disk, decompressing it, extracting knowledge and reporting). If we want to return to the world of 10s searches, we use 6 indexers (one search head is probably still fine, though it may be appropriate to set aside a search head for summary index creation) and searches 1-8 now take 10/6 = 1.6s and with 48 searches, each takes 10s.
Unfortunately, the system isn't typically idle before searches arrive. If we are indexing 150 GB/day, at peak times, we probably are using 4 of the 8 cores doing indexing. That means that the first 4 searches take 10s, and having 48 searches running takes 48/4 = 12x longer, or 2 min to complete each.
Now one might say: let me put sixteen cores per indexer rather than eight and avoid buying some machines. That makes a little bit of sense, but is not the best choice. The number of cores doesn't help searches 1-16 in this case; they still take 10s. With 48 searches, each search will take 48/16 = 3x longer, which is indeed better than 6x. However, it's usually not too much more expensive to buy two 8 core machines, which has advantages: the first few searches will now just take 5s (which is the most common case) and we now have more aggregate I/O capacity (doubling the number of cores does nothing for I/O, adding servers does).
The lesson here is to add indexers. Doing so reduces the load on any system from indexing, to free cores for search. Also, since the performance of almost all types of search scale with the number of indexers, searches will be faster, which mitigates the effect of slowness from resource sharing. Additionally making every search faster, we will often avoid the case of concurrent searches with concurrent users. In realistic situations, with hundreds of users, each user will run a search every few minutes, though not at the exact same time as other users. By reducing the search time by a factor of 6 (by adding more indexers), the concurrency factor will be reduced (not necessarily by 6x, but by some meaningful factor). This in turn, lowers the concurrency related I/O and memory contention.
Accommodating many simultaneous searches
A common question for a large deployment is: how do I account for many concurrent users? Let's take as an example a system that may have at peak times 48 concurrent searches. The short answer is that we can accommodate 48 simultaneous searches on a cluster of indexers and search heads where each machine has enough RAM to prevent swapping. Assuming that each search takes 200MB of RAM per system, that is roughly 10GB additional RAM (beyond indexing requirements). This is because CPU will degrade gracefully with more concurrent jobs but once the working set of memory for all processes exceeds the physical RAM, performance drops catastrophically with swapping.
The caveat here is that a search's run time will be longer in proportion to the number of free cores when no searches were running. For example, suppose the indexers were doing nothing before the searches arrived and have 8 cores each. Suppose the first (of identical searches) takes 10s to complete. Then the first 8 searches will each take 10s to complete since there is no contention. However, since there are only 8 cores, if there are 48 searches running, each search will take 48/8 = 6x longer than if only 1-8 searches were running. So now, every search takes ~1 minute to complete.
This leads to the observation that the most important thing to do here is add indexers. Indexers do the bulk of the work in search (reading data off disk, decompressing it, extracting knowledge and reporting). If we want to return to the world of 10s searches, we use 6 indexers (one search head is probably still fine, though it may be appropriate to set aside a search head for summary index creation) and searches 1-8 now take 10/6 = 1.6s and with 48 searches, each takes 10s.
Unfortunately, the system isn't typically idle before searches arrive. If we are indexing 150 GB/day, at peak times, we probably are using 4 of the 8 cores doing indexing. That means that the first 4 searches take 10s, and having 48 searches running takes 48/4 = 12x longer, or 2 min to complete each.
Now one might say: let me put sixteen cores per indexer rather than eight and avoid buying some machines. That makes a little bit of sense, but is not the best choice. The number of cores doesn't help searches 1-16 in this case; they still take 10s. With 48 searches, each search will take 48/16 = 3x longer, which is indeed better than 6x. However, it's usually not too much more expensive to buy two 8 core machines, which has advantages: the first few searches will now just take 5s (which is the most common case) and we now have more aggregate I/O capacity (doubling the number of cores does nothing for I/O, adding servers does).
The lesson here is to add indexers. Doing so reduces the load on any system from indexing, to free cores for search. Also, since the performance of almost all types of search scale with the number of indexers, searches will be faster, which mitigates the effect of slowness from resource sharing. Additionally making every search faster, we will often avoid the case of concurrent searches with concurrent users. In realistic situations, with hundreds of users, each user will run a search every few minutes, though not at the exact same time as other users. By reducing the search time by a factor of 6 (by adding more indexers), the concurrency factor will be reduced (not necessarily by 6x, but by some meaningful factor). This in turn, lowers the concurrency related I/O and memory contention.