Splunk Search

Can streamstats reset_before (or reset_after) be used with a by clause?

Communicator

Hi,

I have a search similar to this one:

index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count by user
| reverse

This works like a charm, it gives the number of logins per user. but now I want to find the users with X consecutive failed logins. A successful login should reset the count. Therefore I've added reset_before="("result==\"Success\"")"

index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats reset_before="("result==\"Success\"")" count by user
| reverse

which works ... somewhat. It resets all streamstats when an event with result="Success" is encountered, bot not only for one user but for all users:

login user="A" result="Failed"  count=1
login user="A" result="Success" count=1
login user="A" result="Failed"  count=1  <-- This count should be 4
login user="B" result="Success" count=1  <-- This event seems to reset the count for user A too
login user="A" result="Failed"  count=1  <-- This count should be 3
login user="B" result="Success" count=1  <-- This event seems to reset the count for user A too
login user="A" result="Failed"  count=2
login user="A" result="Failed"  count=1
login user="A" result="Success" count=1
login user="A" result="Success" count=1
login user="A" result="Failed"  count=2
login user="A" result="Failed"  count=1

Can this be prevented?

1 Solution

SplunkTrust
SplunkTrust

You don't actually need streamstats' reset argument to accomplish this, and I think it may be more powerful to attack the problem with a more generic approach using just.... streamstats! o_O.

So streamstats will do its count arithmetic split out by any 'split by' field you give it. And at a high level, your problem in this case is that you need it to sort of do it's arithmetic not just by user but also according to this extra reset logic.

Solution -
-- we can construct that extra reset logic as a "reset_count" field that we build with a separate streamstats command. For each user it will be an integer counting the number of times they have "reset" by getting a success.
-- we can then pass that second split-by field as well, to the main streamstats.

-- it will then happily do its normal bucketing, grouping by the unique combination of both user and that user's "resetCount"

It will look something like this. I recommend getting to know it pipe by pipe to closely understand how it's working, (and to iron out any wrinkles I might have here. Note - you may want to use current=f on the second streamstats - i'm not sure. (It depends on whether you want to count events since the reset, or events since and including the reset)

index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count(eval(result=="Success")) as reset_count by user
| streamstats count by user reset_count
| reverse

And in your _internal example, a working example I was testing with is here:

index=_internal component=* 
| streamstats count(eval(log_level=="WARN")) as reset_count by component 
| streamstats current=f count by component reset_count 
| table _time component reset_count log_level count 

BONUS: As an alternate strategy, which may be preferable if you prefer explicit eval statements to the complex eval() syntax in streamstats, you can construct an explicit marker field using a separate eval field. that would look something like this: | eval is_reset=if(result=="Success",1,null()) | streamstats count(is_reset) as reset_count by user
In general I find any of the nested eval() syntax in stats/eventstats/streamstats can always be deconstructed as a more explicit and more readable eval clause plus a stats/eventstats/streamstats without the eval(). But some people also prefer the more compact and more "inscrutable black box" feeling of the nested eval().

View solution in original post

Path Finder

I'm going to add this here as this gave me a good solution to a similar problem.

I was trying to use streamstats reset_after to produce a good running status of multiple services on multiple servers. Something like

  1. Server=1 Service=A Status=UP
  2. Server=1 Service=B Status=UP
  3. Server=2 Service=A Status=UP
  4. Server=2 Service=A Status=UP
  5. Server=1 Service=A Status=DOWN
  6. Server=1 Service=B Status=UP
  7. Server=2 Service=A Status=DOWN
  8. Server=2 Service=A Status=UP
  9. Server=1 Service=A Status=DOWN
  10. Server=1 Service=B Status=UP
  11. Server=2 Service=A Status=UP
  12. Server=2 Service=A Status=DOWN
  13. Server=1 Service=A Status=UP
  14. Server=1 Service=B Status=UP
  15. Server=2 Service=A Status=UP
  16. Server=2 Service=A Status=UP

then use streamstats with reset_after to reset my fail counts after a service comes back up.

I ran into the same problem as listed here that reset_after & reset_before reset all statistics not just the statistics for the by clause stream you Splunk is currently working on.

My solution was to use sort 0 Server Service _time rather than reverse, by grouping the events together by Server and Service you create a serial listing of what we were trying to create in parallel. Something like

  1. Server=1 Service=A UP
  2. Server=1 Service=A DOWN
  3. Server=1 Service=A DOWN
  4. Server=1 Service=A UP And so on for each server & service.

Once you've got that you can use streamstats to give you a field that shows when you change Service or Server from one event to the next and the reset on the field you wanted to reset on and that field.

It's a little complex but it seems to work really nicely for a complex case of seeing what the current and historical state is that I'm trying to solve.

0 Karma

SplunkTrust
SplunkTrust

You don't actually need streamstats' reset argument to accomplish this, and I think it may be more powerful to attack the problem with a more generic approach using just.... streamstats! o_O.

So streamstats will do its count arithmetic split out by any 'split by' field you give it. And at a high level, your problem in this case is that you need it to sort of do it's arithmetic not just by user but also according to this extra reset logic.

Solution -
-- we can construct that extra reset logic as a "reset_count" field that we build with a separate streamstats command. For each user it will be an integer counting the number of times they have "reset" by getting a success.
-- we can then pass that second split-by field as well, to the main streamstats.

-- it will then happily do its normal bucketing, grouping by the unique combination of both user and that user's "resetCount"

It will look something like this. I recommend getting to know it pipe by pipe to closely understand how it's working, (and to iron out any wrinkles I might have here. Note - you may want to use current=f on the second streamstats - i'm not sure. (It depends on whether you want to count events since the reset, or events since and including the reset)

index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count(eval(result=="Success")) as reset_count by user
| streamstats count by user reset_count
| reverse

And in your _internal example, a working example I was testing with is here:

index=_internal component=* 
| streamstats count(eval(log_level=="WARN")) as reset_count by component 
| streamstats current=f count by component reset_count 
| table _time component reset_count log_level count 

BONUS: As an alternate strategy, which may be preferable if you prefer explicit eval statements to the complex eval() syntax in streamstats, you can construct an explicit marker field using a separate eval field. that would look something like this: | eval is_reset=if(result=="Success",1,null()) | streamstats count(is_reset) as reset_count by user
In general I find any of the nested eval() syntax in stats/eventstats/streamstats can always be deconstructed as a more explicit and more readable eval clause plus a stats/eventstats/streamstats without the eval(). But some people also prefer the more compact and more "inscrutable black box" feeling of the nested eval().

View solution in original post

Communicator

Nice workaround! Tried it myself with two streamstats but didn't think of the count(eval(...)) option. Thanks a lot!

0 Karma

SplunkTrust
SplunkTrust

@krdo... just try reversing command to perform aggregation first and then use reset_before or reset_after.

 | streamstats count by user reset_before="("result==\"Success\"")" 
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Communicator

Changed my search to

index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count by user reset_before="("result==\"Success\"")"
| reverse

which gives me the same result... did you mean something different?

0 Karma

SplunkTrust
SplunkTrust

No this should have worked Which version of Splunk are you on? I think reset_after is available after version 6.4.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

SplunkTrust
SplunkTrust

Following is the test query I ran for Splunk's _internal index with sourcetype as splunkd and log_level!="INFO" (Splunk Enterprise version 6.5.1)

| streamstats count by component reset_after="("log_level==\"WARN\"")"
| table _time component log_level count

Following is the output. If you notice ERROR count when log_level changes is also changed.

_time   component   log_level   count
2017-03-29 17:39:23.464 HttpListener    WARN    1
2017-03-29 17:22:08.704 LookupOperator  ERROR   1
2017-03-29 16:58:59.719 AppendProcessor ERROR   1
2017-03-29 16:58:58.751 AppendProcessor ERROR   2
2017-03-29 16:58:58.688 AppendProcessor ERROR   3
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Communicator

I'm on Splunk Enterprise 6.5.1 too; And reset_after changes the behavior of streamstats but not the way I want it; running your search gives me something like this:

2017-03-30 06:31:45.497     TailReader  ERROR   54
2017-03-30 06:31:45.497     WatchedFile     ERROR   54
2017-03-30 06:30:19.015     DateParserVerbose   WARN    1
2017-03-30 06:30:19.015     DateParserVerbose   WARN    1
2017-03-30 06:30:06.907     DateParserVerbose   WARN    1
2017-03-30 06:30:06.907     DateParserVerbose   WARN    1
2017-03-30 06:30:04.325     DateParserVerbose   WARN    1
2017-03-30 06:30:04.325     DateParserVerbose   WARN    1
2017-03-30 06:30:01.665     TailReader  ERROR   1 <-- Should be 55
2017-03-30 06:30:01.665     WatchedFile     ERROR   1 <-- Should be 55

You can see that the count is reset for ALL components because of the warnings from DateParserVerbose. But what I want is that the count is reset only for the component for which the warning is encountered (in this case only the count for DateParserVerbose should be reset and the counters for TailReader and WatchedFile should not be reset).

0 Karma