Deployment Architecture

How do I determine the appropriate timeout value for distributed search

matt
Splunk Employee
Splunk Employee

I see timeouts with distributed search and currently have receiveTimeout set to 300. What metric(s) can I use to determine the what should be the correct value this setting in order to avoid timeouts?

1 Solution

hexx
Splunk Employee
Splunk Employee

The current default value for this parameter is 600 seconds and there really shouldn't be any good reason to proactively increase or lower it.

If while searching, you see errors reporting search peer timeouts, the right question to ask is "Why are search peers taking more than 600 seconds to respond?".

  • The first place to look for a possible explanation is the search.log file created by the search process on the search-head, in $SPLUNK_HOME/var/run/splunk/dispatch/{SID} where {SID} is the ID of the search.

  • Another good place to look is the search.log file created by the search process on the remote peer, in $SPLUNK_HOME/var/run/splunk/dispatch/remote_{SH-ServerName}_{SID} where {SH-ServerName} is the ServerName as set in server.conf for the search-head that dispatched the search, and {SID} is the ID of the search.

    • As peer timeouts are often linked with network quality issues, it is also a good idea to check that there are no issues with the network link between the search-head and the affected peer(s).

View solution in original post

rstrong30
Loves-to-Learn

Some searches are going back to "ALL TIME" so I'd imagine there is an appropriate time to change the default. Depending on one's environment that could go back quite a few months. So if the default is 600 seconds I would imagine changing this to a few more minutes should be safe, right? If not, what's the repercussions?

0 Karma

hexx
Splunk Employee
Splunk Employee

The current default value for this parameter is 600 seconds and there really shouldn't be any good reason to proactively increase or lower it.

If while searching, you see errors reporting search peer timeouts, the right question to ask is "Why are search peers taking more than 600 seconds to respond?".

  • The first place to look for a possible explanation is the search.log file created by the search process on the search-head, in $SPLUNK_HOME/var/run/splunk/dispatch/{SID} where {SID} is the ID of the search.

  • Another good place to look is the search.log file created by the search process on the remote peer, in $SPLUNK_HOME/var/run/splunk/dispatch/remote_{SH-ServerName}_{SID} where {SH-ServerName} is the ServerName as set in server.conf for the search-head that dispatched the search, and {SID} is the ID of the search.

    • As peer timeouts are often linked with network quality issues, it is also a good idea to check that there are no issues with the network link between the search-head and the affected peer(s).

jdunlea_splunk
Splunk Employee
Splunk Employee

I also need to find this out

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...