Hi,
I have a search similar to this one:
index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count by user
| reverse
This works like a charm, it gives the number of logins per user. but now I want to find the users with X consecutive failed logins. A successful login should reset the count. Therefore I've added reset_before="("result==\"Success\"")"
index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats reset_before="("result==\"Success\"")" count by user
| reverse
which works ... somewhat. It resets all streamstats when an event with result="Success"
is encountered, bot not only for one user but for all users:
login user="A" result="Failed" count=1
login user="A" result="Success" count=1
login user="A" result="Failed" count=1 <-- This count should be 4
login user="B" result="Success" count=1 <-- This event seems to reset the count for user A too
login user="A" result="Failed" count=1 <-- This count should be 3
login user="B" result="Success" count=1 <-- This event seems to reset the count for user A too
login user="A" result="Failed" count=2
login user="A" result="Failed" count=1
login user="A" result="Success" count=1
login user="A" result="Success" count=1
login user="A" result="Failed" count=2
login user="A" result="Failed" count=1
Can this be prevented?
You don't actually need streamstats' reset argument to accomplish this, and I think it may be more powerful to attack the problem with a more generic approach using just.... streamstats! o_O.
So streamstats will do its count arithmetic split out by any 'split by' field you give it. And at a high level, your problem in this case is that you need it to sort of do it's arithmetic not just by user but also according to this extra reset logic.
Solution -
-- we can construct that extra reset logic as a "reset_count" field that we build with a separate streamstats command. For each user it will be an integer counting the number of times they have "reset" by getting a success.
-- we can then pass that second split-by field as well, to the main streamstats.
-- it will then happily do its normal bucketing, grouping by the unique combination of both user and that user's "resetCount"
It will look something like this. I recommend getting to know it pipe by pipe to closely understand how it's working, (and to iron out any wrinkles I might have here. Note - you may want to use current=f on the second streamstats - i'm not sure. (It depends on whether you want to count events since the reset, or events since and including the reset)
index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count(eval(result=="Success")) as reset_count by user
| streamstats count by user reset_count
| reverse
And in your _internal example, a working example I was testing with is here:
index=_internal component=*
| streamstats count(eval(log_level=="WARN")) as reset_count by component
| streamstats current=f count by component reset_count
| table _time component reset_count log_level count
BONUS: As an alternate strategy, which may be preferable if you prefer explicit eval statements to the complex eval()
syntax in streamstats, you can construct an explicit marker field using a separate eval field. that would look something like this: | eval is_reset=if(result=="Success",1,null()) | streamstats count(is_reset) as reset_count by user
In general I find any of the nested eval()
syntax in stats/eventstats/streamstats can always be deconstructed as a more explicit and more readable eval clause plus a stats/eventstats/streamstats without the eval(). But some people also prefer the more compact and more "inscrutable black box" feeling of the nested eval().
I'm going to add this here as this gave me a good solution to a similar problem.
I was trying to use streamstats reset_after to produce a good running status of multiple services on multiple servers. Something like
then use streamstats with reset_after to reset my fail counts after a service comes back up.
I ran into the same problem as listed here that reset_after & reset_before reset all statistics not just the statistics for the by clause stream you Splunk is currently working on.
My solution was to use sort 0 Server Service _time
rather than reverse, by grouping the events together by Server and Service you create a serial listing of what we were trying to create in parallel. Something like
Once you've got that you can use streamstats to give you a field that shows when you change Service or Server from one event to the next and the reset on the field you wanted to reset on and that field.
It's a little complex but it seems to work really nicely for a complex case of seeing what the current and historical state is that I'm trying to solve.
This was the right answer. This will work for most data sets. You just need to sort by your aggregates first, then _time. Be sure to put the 0 in the sort so you don't hit the sort limit.
Did this with a massive query and it solved my reset issue.
You don't actually need streamstats' reset argument to accomplish this, and I think it may be more powerful to attack the problem with a more generic approach using just.... streamstats! o_O.
So streamstats will do its count arithmetic split out by any 'split by' field you give it. And at a high level, your problem in this case is that you need it to sort of do it's arithmetic not just by user but also according to this extra reset logic.
Solution -
-- we can construct that extra reset logic as a "reset_count" field that we build with a separate streamstats command. For each user it will be an integer counting the number of times they have "reset" by getting a success.
-- we can then pass that second split-by field as well, to the main streamstats.
-- it will then happily do its normal bucketing, grouping by the unique combination of both user and that user's "resetCount"
It will look something like this. I recommend getting to know it pipe by pipe to closely understand how it's working, (and to iron out any wrinkles I might have here. Note - you may want to use current=f on the second streamstats - i'm not sure. (It depends on whether you want to count events since the reset, or events since and including the reset)
index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count(eval(result=="Success")) as reset_count by user
| streamstats count by user reset_count
| reverse
And in your _internal example, a working example I was testing with is here:
index=_internal component=*
| streamstats count(eval(log_level=="WARN")) as reset_count by component
| streamstats current=f count by component reset_count
| table _time component reset_count log_level count
BONUS: As an alternate strategy, which may be preferable if you prefer explicit eval statements to the complex eval()
syntax in streamstats, you can construct an explicit marker field using a separate eval field. that would look something like this: | eval is_reset=if(result=="Success",1,null()) | streamstats count(is_reset) as reset_count by user
In general I find any of the nested eval()
syntax in stats/eventstats/streamstats can always be deconstructed as a more explicit and more readable eval clause plus a stats/eventstats/streamstats without the eval(). But some people also prefer the more compact and more "inscrutable black box" feeling of the nested eval().
Nice workaround! Tried it myself with two streamstats but didn't think of the count(eval(...)) option. Thanks a lot!
@krdo... just try reversing command to perform aggregation first and then use reset_before or reset_after.
| streamstats count by user reset_before="("result==\"Success\"")"
Changed my search to
index=* login user=* (result="Success" OR result="Failed")
| reverse
| streamstats count by user reset_before="("result==\"Success\"")"
| reverse
which gives me the same result... did you mean something different?
No this should have worked Which version of Splunk are you on? I think reset_after is available after version 6.4.
Following is the test query I ran for Splunk's _internal index with sourcetype as splunkd and log_level!="INFO" (Splunk Enterprise version 6.5.1)
| streamstats count by component reset_after="("log_level==\"WARN\"")"
| table _time component log_level count
Following is the output. If you notice ERROR count when log_level changes is also changed.
_time component log_level count
2017-03-29 17:39:23.464 HttpListener WARN 1
2017-03-29 17:22:08.704 LookupOperator ERROR 1
2017-03-29 16:58:59.719 AppendProcessor ERROR 1
2017-03-29 16:58:58.751 AppendProcessor ERROR 2
2017-03-29 16:58:58.688 AppendProcessor ERROR 3
I'm on Splunk Enterprise 6.5.1 too; And reset_after changes the behavior of streamstats but not the way I want it; running your search gives me something like this:
2017-03-30 06:31:45.497 TailReader ERROR 54
2017-03-30 06:31:45.497 WatchedFile ERROR 54
2017-03-30 06:30:19.015 DateParserVerbose WARN 1
2017-03-30 06:30:19.015 DateParserVerbose WARN 1
2017-03-30 06:30:06.907 DateParserVerbose WARN 1
2017-03-30 06:30:06.907 DateParserVerbose WARN 1
2017-03-30 06:30:04.325 DateParserVerbose WARN 1
2017-03-30 06:30:04.325 DateParserVerbose WARN 1
2017-03-30 06:30:01.665 TailReader ERROR 1 <-- Should be 55
2017-03-30 06:30:01.665 WatchedFile ERROR 1 <-- Should be 55
You can see that the count is reset for ALL components because of the warnings from DateParserVerbose. But what I want is that the count is reset only for the component for which the warning is encountered (in this case only the count for DateParserVerbose should be reset and the counters for TailReader and WatchedFile should not be reset).