Deployment Architecture
Highlighted

How do i know if my deployment server is overloaded?

Communicator

I currently run a combination search head/deployment server on an Intel Xeon 4 core server.

The following command indicates that i am serving 959 deploy clients at the moment.

splunk list deploy-clients | grep -c hostname:

959 is a whole lot more than numbers I've been seeing elsewhere.

It seems to be working okay. There are no crazy delay in the search head. Commands come back quickly.

idle load average: 0.29, 0.40, 0.43

How do i know if the deployment server is over subscribed? Would there be errors in a log-file somewhere?

0 Karma
Highlighted

Re: How do i know if my deployment server is overloaded?

Builder

You need to do two searches. First do a real time serach on some of your data, then do a typical search. If the realtime results are not procesed imeadiatly, you are using cpu resources for the search.


I have seen on my system during the day, of up to 15 minutes. Note that the delay is normal and the logs are just indexed late.


We have 3 log entries for some of our events, A detailed start, detailed end, and a summary end. I have written queries where I look for all three. In our case, there are times that we loose logs, so I know our system is over loaded.

0 Karma
Highlighted

Re: How do i know if my deployment server is overloaded?

Motivator

I have seen Deployment Server overloaded to the point that splunkd is close to unresponsive. That is when you really know it is overloaded!

0 Karma
Highlighted

Re: How do i know if my deployment server is overloaded?

Splunk Employee
Splunk Employee

Splunk's recommendations are:

A small deployment server (30 or fewer clients) can co-reside with a splunk instance which has other duties, such as a search head, indexer, or other splunk instance.
At moderate to large sizes (30-300), the deployment server should reside on its own splunk instance which does not have other duties.

The deployment server accesses can interfere with other management port activities, such as search, management, UI functionality, distributed search, etc. etc.
At moderate sizes, the phoneHomeIntervalInSecs should be increased from its default value of 30 seconds, to a larger value which meets your business goals. Can deployment clients wait 10 minutes to receive updates? Perhaps 600 is more appropriate then.

At the moment Deployment Server oversubscription has to be gauged via a series of observations.

Is the splunkd HTTP server overwhelmed with the number of concurrent clients that are connecting? Are we spawning too many threads to service these clients?

This may not be apparant in splunkd.log. You would need to look at splunkd_access.log to track the rate at which HTTP requests are being served or the number of sockets held by splunkd using lsof, or the number of threads running using pstack.

You could also try using "netstat -an |grep |grep EST |wc -l
(eg netstat -an |grep 8089 |grep EST |wc -l)

if the value returned ig high (eg in the hundreds), this may be a indication that the Deployment is over subscribed)