I have a fairly straightforward query using timechart to count the top 10 users triggering an event. ( Sanitized )
index=foobar EventCode=1234 | timechart span=1d count(EventCode) BY user WHERE max in top10 usenull=f useother=f
This returns a chart that makes good sense. Now in order to separate noise from things II'm concerned about, I want to only see users who have triggered the event 10 times or more per day. A quick look at the manual provided this :
Chart the eventypes by sourceip
For each minute, count the eventypes by sourceip, where the count is greater than 10.
So I adapted this to my existing query :
index=foobar EventCode=1234 | timechart span =1d count(EventCode) by user usenull=f WHERE count>10
This however did NOT generate what I expected. I expected to see the same chart as before, only with the users with less than 10 events triggered per day being left out. Instead I have a bizarre chart that I received a truncation warning for and that included users that had fewer than 10 per day.
I've been told I need to put spaces in the WHERE clause ( WHERE count >10 , or WHERE count > 10 ) but this made no change.
I've been told that the chart is accurate and that it reflects users who have hit more than 10 at any time ( not per day ) but that did not make sense to me, nor did it jibe with what the manual said.
After many tries to make timechart work, I gave up and explored other options , and discovered this solution in a community article :
index=foobar EventCode=1234 | bucket span=1d _time | stats count by _time user | where count > 10 | xyseries _time,user,count
This did return what I was looking for , but I'm not sure why one works and the other doesn't.
Any suggestions as to why?
user fields get turned into a matrix such that the field
count is absorbed. You need to bring it back, filter on it, then, go back to where you were; try this:
index=foobar EventCode=1234 | timechart span=1d count BY user usenull=f | untable _time user count | eventstats max(count) AS max_count BY user | where max_count>10 | timechart span=1d avg(count) AS count BY user usenull=f
It depends on what you mean; if the original answer isn't it, then maybe you mean this:
index=foobar EventCode=1234 | timechart span=1d count BY user usenull=f | untable _time user count | where count>10 | timechart span=1d avg(count) AS count BY user usenull=f
Have you tried using a simple count in your timechart command, instead of count(EventCode)? Results should be the same anyway, since you already filter for a certain EventCode in your base search so each event will have that field populated.
Edit: when I read the docs correctly, using
where count > 10 like this, actually applies over the whole period, it doesn't apply the filter for each timespan. So if the sum of the event count over all days is > 10, it will include that entry, even if single days are all below 10.
Thanks for the reply. I tried it with COUNT instead of COUNT(EventCode) and as you indicated the results were the same.
I'm not sure what you meant by " when I read the docs correctly". Which docs are you referring to , and where do they indicate that the results are the sum is over all days and not per span?
What I had read ( under Basic Examples , in the on line docs for 7.3.0 ) was :
"For each minute, count the eventypes by source_ip, where the count is greater than 10."
This satisfies my need for the count to be by each span , and not over all time specified. The only difference is that I used "span=1d" instead of 1 minute.
Are you suggesting that there is an error in the docs, or did I miss something / am I looking in the wrong place?
"Example 4: Using the where clause with the count function measures the total number of events over the period. This yields results similar to using the sum function."
OK this makes sense to me now.
So all in all, is TIMECHART the wrong approach to what I'm trying to do?
My goal is to chart all users who have tripped a specific alarm , per day, over 30 days and then a second chart that shows the same information, but only for users who have tripped the alarm more than 10 times in a day.
I guess so. Although you could still do
| timechart span=1d sum(count) as count by user instead of your xyseries. But you'll have to do some steps before to filter against that threshold indeed.
And that also makes sense if you look at what the output of
timechart is. It gets you a _time column and then 1 column for each user (in contrast to a stats count by user, which gets you a count and a user column). What you want to then do is remove individual cells from that result. I guess in theory you could do that by following the
timechart with a
foreach construct to set any cell below the threshold to 0. But doing a stats first, filter that and then do the charting is probably easier.