Hi all,
First time posting here so please be patient and I am relatively new to the Splunk environment, but I am struggling to figure out this search function.
My manager has asked me to create an alert for Load Balancers flapping on our server.
Criteria;
- Runs every 15 mins (I assume this can be set in the "alert" settings)
- Fires if a load balancer switches from Up to Down and Back more than 5 times
This second point I am struggling to work out - this is what I have so far;
index=xxx sourcetype="xxx" host="xxx" (State=UP OR State=DOWN) State="*"
| stats count by State
| eval state_status = if(DOWN+UP == 5, "Problem", "OK")
| stats count by state_status
Note - "State" is the field in question as it stores the UP/DOWN events which have values.
Based on this, I can get an individual count of when the load balancer displayed UP and when it displayed DOWN, however I need to turn this into a threshold search to only display a count of how many times it changed from UP to DOWN 5x consecutive times.
Any and all help will be much appreciated.
If you are looking to count transitions, then use streamstats.
Note examples like this that use | makeresults are generally designed to show you how you can achieve something.
This is a simple example you can run in the search window, which will create alternating events over 15 minutes. It achieves this by
| makeresults count=15
| streamstats c
| eval _time=now() - (c * 60)
| eval state=if(c%2=1, "UP", "DOWN")
| sort _time
| streamstats window=2 list(state) as states
| eval transition=if(mvjoin(states,":")="UP:DOWN", 1, 0)
| stats sum(transition) as flaps
Note this is done so that if you run this example
| makeresults count=15
| streamstats c
| eval _time=now() - (c * 60)
| eval state=if(c%3!=0, "UP", "DOWN")
| sort _time
| streamstats window=2 list(state) as states
| eval transition=if(mvjoin(states,":")="UP:DOWN", 1, 0)
| stats sum(transition) as flaps
which makes an UP, UP, DOWN, UP, UP, DOWN sequence, the it will only treat the UP/DOWN as a transition, and ignore the additional UP messages.
Note also if you want to start doing this by host, then your streamstats would look like this
| streamstats window=2 global=f list(state) as states by host
and also by host on the final stats.
Hi @bowesmana
Appreciate the long and in-depth response, however I'm not sure how to apply that to my scenario (relatively new Splunk user).
My manager has advised this process is too complicated and simply a count of up and down events by the load balancer / VIP in question and a threshold search is all that is needed. Not too sure how to apply your examples below to my scenario....
If you just want to count # of up and # of down messages regardless of sequence and the total is more than 5 regardless of whether it is 4 up 1 down or vice-versa, then
index=xxx sourcetype="xxx" host="xxx" (State=UP OR State=DOWN)
| stats count
will just give you a count, but I assume you need to have some logic in there that determines if down is > 0
so there are lots of ways to do this, but simply
index=xxx sourcetype="xxx" host="xxx" (State=UP OR State=DOWN)
| stats count by State
| transpose header_field=State
| where DOWN>2 AND UP>2
or
index=xxx sourcetype="xxx" host="xxx" (State=UP OR State=DOWN)
| chart count over host by State
| where DOWN>2 AND UP>2
Depending on your data these may be OK, but hopefully will give you a way to make it work for you