Hi,
We generally raise tickets in Prod through Splunk by putting search query as Report/Alert and now we have a requirement to alert if the load is not equally distributed b/w the hosts. With the top command I see result is in % but I wasn't able to use it in where cause to calculate the deviation.
Say we have 4 hosts sharing an app and ideally it should be almost equal distribution but in unwanted scenario if load is lesser in Prod on one of the host Or higher on a host, I should have an alert.
log ex : index=data loggerName="xyzzy" threadName="thread1" appName="dataSync"
Give this a try. I've used 1 percent as the threshould difference between a host's percent versus average percent (100/total hosts).
index=data loggerName="xyzzy" threadName="thread1" appName="dataSync"
| top host showperc=t showcount=f | eventstats count
|eval average=100/count
| where percent<average-1 OR percent>average+1
Is it possible the same way for 100's of servers (different servers like app servers, db servers etc..) comparison.
Hi Vicky84,
I would recommend looking at collecting host metrics using something like collectd or the nix_ta or nmon, etc, rather than top, so you can get the CPU trend over time. then you could compare the trends and calculate a deviation
May be in a larger context what you are referring may mean more sense and to monitor OS stats but I am not well versed in that and something like below Splunk query would do the task for me.
Give this a try. I've used 1 percent as the threshould difference between a host's percent versus average percent (100/total hosts).
index=data loggerName="xyzzy" threadName="thread1" appName="dataSync"
| top host showperc=t showcount=f | eventstats count
|eval average=100/count
| where percent<average-1 OR percent>average+1
Exactly as I wanted !