Deployment Architecture

Bandwidth necessary between distributed search indexers?

Jason
Motivator

Say I have two indexers in two different datacenters, and I want to distribute searches across the WAN/VPN/Internet between them. What kind of bandwidth is necessary for optimal search performance? For minimal performance?

I'm assuming all the work happens on the indexer, but the indexer-indexer connection does need to send the search parameters in one direction and receive the reply events in the other.

Tags (2)
1 Solution

David
Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

View solution in original post

0 Karma

David
Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

0 Karma

Simeon
Splunk Employee
Splunk Employee

Gigabit ethernet or better is optimal. A slow search peer will affect the total performance for the distributed searches. For this reason, you should be very considerate of having consistent performance across all peers.

0 Karma

Jason
Motivator

Are we talking gigabit ethernet in terms of low latency, or in terms of running multiple MB/sec through the pipe? I was hoping to get an answer on bandwidth/pipe speed needed for an acceptable dist. search across multiple sites.

0 Karma
Get Updates on the Splunk Community!

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...

Explore the Latest Educational Offerings from Splunk (November Releases)

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...