Deployment Architecture

Bandwidth necessary between distributed search indexers?

Motivator

Say I have two indexers in two different datacenters, and I want to distribute searches across the WAN/VPN/Internet between them. What kind of bandwidth is necessary for optimal search performance? For minimal performance?

I'm assuming all the work happens on the indexer, but the indexer-indexer connection does need to send the search parameters in one direction and receive the reply events in the other.

Tags (2)
1 Solution

Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Gigabit ethernet or better is optimal. A slow search peer will affect the total performance for the distributed searches. For this reason, you should be very considerate of having consistent performance across all peers.

0 Karma

Motivator

Are we talking gigabit ethernet in terms of low latency, or in terms of running multiple MB/sec through the pipe? I was hoping to get an answer on bandwidth/pipe speed needed for an acceptable dist. search across multiple sites.

0 Karma