Splunk Search

How to detect a flapping load balancer - up to down to back 5x times?

amoshos
Loves-to-Learn

Hi all,

First time posting here so please be patient and I am relatively new to the Splunk environment, but I am struggling to figure out this search function.

My manager has asked me to create an alert for Load Balancers flapping on our server.

Criteria;
- Runs every 15 mins (I assume this can be set in the "alert" settings)
- Fires if a load balancer switches from Up to Down and Back more than 5 times

This second point I am struggling to work out - this is what I have so far;

 

 

 

 

index=xxx  sourcetype="xxx" host="xxx" (State=UP OR State=DOWN) State="*"
| stats count by State
| eval state_status = if(DOWN+UP == 5, "Problem", "OK")
| stats count by state_status

 

 

 

 

 

Note - "State" is the field in question as it stores the UP/DOWN events which have values.


Based on this, I can get an individual count of when the load balancer displayed UP and when it displayed DOWN, however I need to turn this into a threshold search to only display a count of how many times it changed from UP to DOWN 5x consecutive times.

Any and all help will be much appreciated.

Labels (4)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@amoshos 

If you are looking to count transitions, then use streamstats. 

Note examples like this that use | makeresults are generally designed to show you how you can achieve something.

This is a simple example you can run in the search window, which will create alternating events over 15 minutes. It achieves this by

  • sorting the events in time order, so the earliest on comes first
  • adding the two adjacent event states into a new field called states, that contains the previous event and the current event state
  • checks if the two states are in the order UP->DOWN, indicating the previous state was up and the new state is down (value 1) or value 0 if not
  • sum all the value 1 states from above
| makeresults count=15
| streamstats c
| eval _time=now() - (c * 60)
| eval state=if(c%2=1, "UP", "DOWN")
| sort _time
| streamstats window=2 list(state) as states
| eval transition=if(mvjoin(states,":")="UP:DOWN", 1, 0)
| stats sum(transition) as flaps

Note this is done so that if you run this example

| makeresults count=15
| streamstats c
| eval _time=now() - (c * 60)
| eval state=if(c%3!=0, "UP", "DOWN")
| sort _time
| streamstats window=2 list(state) as states
| eval transition=if(mvjoin(states,":")="UP:DOWN", 1, 0)
| stats sum(transition) as flaps

which makes an UP, UP, DOWN, UP, UP, DOWN sequence, the it will only treat the UP/DOWN as a transition, and ignore the additional UP messages.

Note also if you want to start doing this by host, then your streamstats would look like this

| streamstats window=2 global=f list(state) as states by host

and also by host on the final stats.

0 Karma

amoshos
Loves-to-Learn

Hi @bowesmana 

Appreciate the long and in-depth response, however I'm not sure how to apply that to my scenario (relatively new Splunk user).

My manager has advised this process is too complicated and simply a count of up and down events by the load balancer / VIP in question and a threshold search is all that is needed. Not too sure how to apply your examples below to my scenario....

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you just want to count # of up and # of down messages regardless of sequence and the total is more than 5 regardless of whether it is 4 up 1 down or vice-versa, then

index=xxx  sourcetype="xxx" host="xxx" (State=UP OR State=DOWN) 
| stats count 

 will just give you a count, but I assume you need to have some logic in there that determines if down is > 0

so there are lots of ways to do this, but simply

index=xxx  sourcetype="xxx" host="xxx" (State=UP OR State=DOWN) 
| stats count by State
| transpose header_field=State
| where DOWN>2 AND UP>2

or

index=xxx  sourcetype="xxx" host="xxx" (State=UP OR State=DOWN) 
| chart count over host by State
| where DOWN>2 AND UP>2

Depending on your data these may be OK, but hopefully will give you a way to make it work for you

0 Karma
Get Updates on the Splunk Community!

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...