OK, so this search is reading an input file looking for where the field ErrorCode has data populated in it. I am trying to count the occurrences of those errors and if they are 10 or more consecutive errors I will be triggering an alert.
Here is the search:
| inputlookup myfile.csv
| eval _time=strptime(RequestDatetime,"%F %T")
| search (RequestDatetime>="2020-08-19" AND RequestDatetime<"2020-08-20")
| search (InfoSourceID="3" OR InfoSourceID="4") AND ErrorCode=*
| streamstats reset_after=(isnull(errorCode)) count
|stats latest(eval(if(count>=10,_time,NULL))) as _time
The ErrorCode field may or may not have data in it. The requirement is to count 10 or more consecutive errors and trigger an alert. The issue is when testing I added some blank fields to see if the reset_after line would reset the count and it did not.
For example, the line on the left works fine and triggers an alert. The one on the right triggers an alert but I don't want it to because they are not consecutive.
ErrorCode | ErrorCode |
data | data |
data | null |
data | data |
data | null |
data | data |
data | null |
data | data |
data | null |
data | data |
data | null |
data | |
null | |
data | |
null | |
data | |
null | |
data | |
null | |
data | |
null | |
data |
Am I using streamstats correctly here?
Thanks.
Looks like you're confusing null here.
If you are doing Errorcode=* then Errorcode MUST exists and therefore cannot be null, so you will not have any events where isnull(ErrorCode)
Also, you example shows you doing 'errorCode' (lower case 'e') in the test.
However, if ErrorCode is the text 'null' then your if test should be if(ErrorCode="null"...)
Looks like you're confusing null here.
If you are doing Errorcode=* then Errorcode MUST exists and therefore cannot be null, so you will not have any events where isnull(ErrorCode)
Also, you example shows you doing 'errorCode' (lower case 'e') in the test.
However, if ErrorCode is the text 'null' then your if test should be if(ErrorCode="null"...)
good call removing Errorcode=* from search fixed the reset_after issue thanks
hmmm ok so all I am trying to do is find 10 consecutive errors in my log so I can trigger an alert. Errors always have something in the ErrorCode field and regular messages do not. Is there a better approach?
Thanks for pointing that out my that was a typo that I missed. ErrorCode is still the same result.
| streamstats reset_after=(isnull(ErrorCode)) count
Good point about the search "Errorcode=*" @bowesmana, unless a "null" string is actually the output value in ErrorCode column.
@irishmanjb, that will change the query I provided. The eval may need updating depending on the source data.
The string null in file is totally different thing that “value” null(). Basically you could do first Something like that
eval ErrorCode = if (isnotnull(ErrorCode), if(ErrorCode = “null”, null(), ErrorCode), null())
it changes ErrorCode to value null() if it was string “null”.
r. Ismo
please check the syntax as I haven’t splunk in my hands to test it.
Hi @irishmanjb
I think in this case it may be simpler to only look at the previous 10 events and then use a group by clause on the ErrorCode and with reset_on_change to true, as shown ...
...
| streamstats window=10 count(eval(if(isnotnull(ErrorCode), 1, null() ))) reset_on_change=true BY ErrorCode
| where count=10
...
Hope this helps
Also note, from your query, this may fix it
...
| streamstats reset_after="("isnull(ErrorCode)")" count
...
tried this same result
| streamstats reset_after="("isnull(ErrorCode)")" count