I have a type of event that happens about 20 times a day. Each event carry a numeric value. Meaning is found in the sum of this value for each day.
To be able to compare the current day sum, I want an average of this sum, for the past 30 days. But not only that, I want a historic of this average sum for each span of past 30 days from now.
We are the 10/5, I want to see the average of the daily sum for 10/05 -> 09/05, 09/05 -> 08/05, 08/05 -> 07/05 , etc for the past 6 months.
So far I've done this :
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0 | bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time | eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000 | timechart avg(kTotalEntitiesProcessed) span=30d
Back to reality, I only have data since a few days ago, so the result should only show me one metric : the last 30 days.
But unfortunately, as we are the 10/05, it shows me a value for 2017-09-29 and for 2017-08-30.
The time interval I selected is from now to last 6 months, no snap-to/now
Try this ...
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day
| stats sum(entitiesProcessed) AS TEP BY _time
| eval KTEP = TEP / 1000
| streamstats count as DaysInPeriod avg(KTEP) as avgKTEP window=30
That gives you the 30 day average on every day. If you want to cut it down to only one record per 30 day increment, then you can do something like this:
| eventstats max(_time) as maxtime
| where ( (maxtime-_time) % (86400*30) = 0)
I've added | timechart first(avgKTEP)
to be able to draw a chart of it.
To be able to chart the speed of each event ( entitiesProcess / duration ), I've written down this :
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day
| eval timeStart = strptime(date, "%+")
| eval timeEnd = strptime(endDate, "%+")
| eval duration = timeEnd - timeStart
| stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time
| eval speed = totalEntitiesProcessed / totalDuration
| streamstats count as DaysInPeriod avg(speed) as avgSpeed window=30
| eventstats max(_time) as maxtime
| where ( (maxtime-_time) % (86400*30) = 0)
| timechart first(avgSpeed)
The request shows no error but the average speed is wrong.
Some plain text : the average speed is day-based, not event-based. So the sum of entitiesProcess divided by totalDuration for one day gives the speed of the day. The average speed for the past 30 days is the average of the speed of each day for the past 30 days. Maybe that sounds redundant but I hope there is no more ambiguity.
I though I had written this right in the query but obviously I'm wrong somewhere, ain't I ?
The span on the timechart/bucket starts calculating from 01-01-1970 so it'll no necessarily snap to current day. If you always run this for last 6 month, try this
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time
| eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000
| eval _time=case(_time>=relative_time(now(0,"@d"),_time, _time>=relative_time(now(0,"@d-1mon"),relative_time(now(0,"@d-1mon"),_time>=relative_time(now(0,"@d-2mon"),relative_time(now(0,"@d-2mon"),_time>=relative_time(now(0,"@d-3mon"),relative_time(now(0,"@d-3mon"),_time>=relative_time(now(0,"@d-4mon"),relative_time(now(0,"@d-4mon"),_time>=relative_time(now(0,"@d-5mon"),relative_time(now(0,"@d-5mon"))
| stats avg(kTotalEntitiesProcessed) by _time
Thanks for your answer. Can you explain the eval statement please ? As far as I understand you modify the _time value to shift it in a way that matches the need & the bucket behavior.
That is correct. If today is 10/3, you wanted a bucket for current day (10/03), one for previous month (from 09/03 to 10/02) and so on. So the eval statement is updating the _time value as
1) if _time is from today (greater than or equal to midnight today), use midnight time for all those events
2) if above is not true and if _time falls between last month from yesterday (from 09/03 to 10/02), assign midnight of 09/03 to all those events. With same _time for those, you can calculate average easily)
3) if above is not true and if _time falls between 2 month ago from yesterday to 1 month ago from yesterday (from 08/03 to 09/02), assign midnight of 08/03 to all those events.,,, and so on.
I've ended up wiriting this :
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day | stats sum(entitiesProcessed) AS totalEntitiesProcessed BY _time
| eval kTotalEntitiesProcessed = totalEntitiesProcessed / 1000
| eval _time = case(
_time >= relative_time(now(),"@d"), _time,
_time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
_time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
_time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
_time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
_time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
)
| timechart avg(kTotalEntitiesProcessed)
And I have what I wanted. For a related chart, I would like the average speed for the events. I've written this but I think the status sum
is messing up the query. Can you explain me why ?
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day
| stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time date
| eval timeStart = strptime(date, "%+")
| eval timeEnd = strptime(endDate, "%+")
| eval duration = timeEnd - timeStart
| eval _time = case(
_time >= relative_time(now(),"@d"), _time,
_time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
_time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
_time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
_time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
_time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
)
| eval speed = totalEntitiesProcessed / totalDuration | timechart avg(speed)
EDIT : after checking, the output average is wrong 😕
Your order of stats is off. It should be done after your eval duration so that the field you calculated is available to it. So it should be done like this. ALso check if you require the field date
in stats as _time is already bucketed for a day.
sourcetype=BatchLog name=OperationInsertionBatchJob entitiesProcessed > 0
| bucket _time span=day
| eval timeStart = strptime(date, "%+")
| eval timeEnd = strptime(endDate, "%+")
| eval duration = timeEnd - timeStart
| stats sum(entitiesProcessed) AS totalEntitiesProcessed sum(duration) AS totalDuration BY _time
| eval _time = case(
_time >= relative_time(now(),"@d"), _time,
_time >= relative_time(now(),"@d-1mon"), relative_time(now(),"@d-1mon"),
_time >= relative_time(now(),"@d-2mon"), relative_time(now(),"@d-2mon"),
_time >= relative_time(now(),"@d-3mon"), relative_time(now(),"@d-3mon"),
_time >= relative_time(now(),"@d-4mon"), relative_time(now(),"@d-4mon"),
_time >= relative_time(now(),"@d-5mon"), relative_time(now(),"@d-5mon")
)
| eval speed = totalEntitiesProcessed / totalDuration | timechart avg(speed)
Your updated request only shows empty avg(speed) 😕