Splunk Search

How to match when a certain value/string has occurred more than x times within an interval?

martinhelgegren
Explorer

Hi!

I have various syslog clients sending me logs about their current state (a certain process). Eg.

[timestamp] host1 is in state "yes"
[timestamp] host2 is in state "yes"
[timestamp] host1 is in state "no"
[timestamp] host2 is in state "no"

When ever a state transition occurs randomly and at a low "rate" I don't care but I want to find and match when more than x hosts has logged the same state transition from "yes" to "no" (or vice versa) within a certain time interval eg. 60s. Kind of a "sliding window" idea where only chunks of/groups of the same value during a short period of time is of interest to me.

Of course - thus far I've only managed to visualize the "peaks" but unfortunately the peaks are rather anonymous. They don't stand out. Any ideas where to start? Which attributes in SPLUNK search may present such desirable function?

Labels (3)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Starting directly from the original requirement.  A transition is a transaction of sort.  But differing from how SPL's transaction command works, you want to count transition in both directions.  Apart from streamstats, I think you can construct the search using just stats.

| bin span=1m _time
| stats earliest(state) as startstate dc(state) as scount by host _time
| where scount > 1
| eval change = if(startstate == "yes", "yes_to_no", "no_to_yes")
| stats dc(host) as changed by _time change
| where changed > 100 ``` assume your threshold is 100 ```

The key here is to use state count as  a marker of state change, and preserve the initial state to determine the direction of the change.  Of course, usual caveats apply as we substitute a clock interval for sliding window.  This method also loses resolution of multiple changes within the same interval.

0 Karma

martinhelgegren
Explorer

Thank you for your efforts! I did get somewhere near target using this strategy I found on another thread yesterday:

</ | bin _time span=5m | stats count AS state_changes by _time | eval occurrence=if(state_changes!=0, 1, 0) | where state_changes > 50 >

...though the eval occurrence is pretty much useless as I do not care about the number of occurences and this only ever shows 1 and never 2.
This produces the results as a table with _time, amount of state changes and the obsolete occurences. It achieves the main purpose which is identifying all the "buckets" (intervals) with "more than x times". The next task is to try to figure out how to use the matched events in eg. an annotation or something else in the timeline, or even setup some kind of triggering.

I will however try out our strategy during the day and see where it leads! Will keep you posted!

0 Karma

martinhelgegren
Explorer

Did a quick test and I don't think this would be a better solution for my target, sorry...

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

This isn't so much a sliding window but it would count occurrences within 1 minute slots

| streamstats global=f current=f latest(state) as previous by host
| eval change = case(previous == "up" AND state == "down", -1, previous == "down" AND state == "up", 1, 1==1, null())
| bin _time span=1m
| stats count by _time change
0 Karma

martinhelgegren
Explorer

I have tried the suggested functions and while the first part:

| streamstats global=f current=f latest(state) as previous by host

 ...provides another column with the previous value of the matched field the full formula doesn't match anything no matter which interval I try . I've tried 1,5 and 60 minutes.

I also can't interpret the syntax of the functions mentioned, where does the amount of hosts come in? What is the minimum triggering amount of hosts?

Grateful for the help!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

OK I used "up" and "down" instead of "yes" and "no", I also assumed your field might be called "state". I also assumed you wanted to count occurrences in each minute. The commands mentioned would give you the counts you wanted. You could then use a where command to filter out any minutes that are below the threshold you want.

If this still isn't making sense, perhaps you could share your query so we can see where it might be going wrong. Please use a code block </> (as I did) to post your SPL.

Also, it might be useful to see some sample, anonymised events (again, in a code block)

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I think you need to bucket _time before running streamstats.

| bin _time span=1m
| streamstats global=f current=f latest(state) as previous
  by host _time ``` latest needs to be bucketed ```
| eval change = case(previous == "up" AND state == "down", -1,
  previous == "down" AND state == "up", 1, true(), null())
| stats count by _time change
0 Karma

martinhelgegren
Explorer

Hi! Adding a bucket does something but it seems I can't match all the data properly. I will need to reply with some some details of how the text and search strings are built when I have the opportunity to. Please be patient...

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...