Deployment Architecture

Bandwidth necessary between distributed search indexers?

Jason
Motivator

Say I have two indexers in two different datacenters, and I want to distribute searches across the WAN/VPN/Internet between them. What kind of bandwidth is necessary for optimal search performance? For minimal performance?

I'm assuming all the work happens on the indexer, but the indexer-indexer connection does need to send the search parameters in one direction and receive the reply events in the other.

Tags (2)
1 Solution

David
Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

View solution in original post

0 Karma

David
Splunk Employee
Splunk Employee

The real answer for this question is always "depends on your requirements." When I (as an end user) posed the question to someone at the Splunk UserCon, it was in the scenario of a small number of globally distributed datacenters with identical functions, and a desire to do a single search that will find similar events world-wide. Here is what I was told:

For a mb/sec measure, it will depend on the size of your dataset, and how much you can rely on map operations (happening locally at the indexer) versus reduce operations (happening on the search head). From a latency perspective, I was told that across the US (~70 ms round trip) is reasonable but pushing it, and from West Coast USA to Germany (~160 ms) would definitely be slow for most uses. Additionally, in earlier versions of Splunk (4.0 and prior, I believe -- definitely prior to 4.0) the timeouts for this were set such that you'd have more issues. That has improved.

The recommendation I received was to create two different instances, USASplunk.YourDomain.com and GlobalSplunk.YourDomain.com, so that for queries that really need the full perspective, you can use the global instance, but you can use the USA instance for more usable searches. These instances could both exist on the same box, as there aren't going to be any real local performance implications (beyond the obvious).

Hopefully this will be helpful (and address your actual question). Perhaps someone who has some more realistic numbers could provide a better benchmark. The other advice I got was to ensure that you're defining your search string to do a significant (remote) map phase versus a significant (local) reduce phase, to improve high latency usability. I tried to find some more documentation on this, but wasn't too successful -- perhaps someone else can chime in with more substantive fact.

0 Karma

Simeon
Splunk Employee
Splunk Employee

Gigabit ethernet or better is optimal. A slow search peer will affect the total performance for the distributed searches. For this reason, you should be very considerate of having consistent performance across all peers.

0 Karma

Jason
Motivator

Are we talking gigabit ethernet in terms of low latency, or in terms of running multiple MB/sec through the pipe? I was hoping to get an answer on bandwidth/pipe speed needed for an acceptable dist. search across multiple sites.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...