Hi.
I'm a splunk newbie and I am trying to construct a query over multiple sources that will do a sum of points over a day in 1 hr buckets and tell me whether in the past hour I have breached the max bucket size by a factor of 10%. I have tried to do this via a subsearch, which is very slow and I can't get it to quite work, or as one query, where I got the counts, but can't figure out how to get it to tell me whether it breached the max.
Here is what I have so far:
query returning pts for each event| bin _time span=1h as hour | stats sum(pts) as sum_pts by hour sourcetype
Any help at all would be greatly appreciated!
Try this
query returning pts for each event earliest=-24h@h latest=@h
| bin _time span=1h as hour
| eval thisHour = if (hour <= relative_time(now(),"-1h@h"),0,1)
| stats sum(pts) as sum_pts by hour sourcetype thisHour
| stats max(sum_pts) as maxPts by sourcetype thisHour
| eval pts = if(thisHour==1,maxPts,0)
| eval maxPts = if(thisHour==0,maxPts,0)
| stats sum(maxPts) as maxPts sum(pts) as PtsThisHour by sourcetype
| where PtsThisHour >= (maxPts * 1.1)
This could probably be shortened, but it will be fast. Here is a line-by-line explanation
1- search - I added the earliest
and latest
because the search should to start and end on hour boundaries to be accurate.
2 - eval - sets a field that will be 1
if the data is from the last hour, and 0
otherwise. This lets us avoid a subsearch or join
3 - stats - Add up the points for each sourcetype by hour AND whether or not the thisHour is set. We should still have 24 results for each sourcetype.
4 - stats - Figure out the max points for each sourcetype. Including thisHour means we will get two results for each sourcetype - one result which reflects just the sum for the past hour, and the other result which contains the maximum of the other 23 hours.
4 & 5 - eval - These two statements effectively move the "maxPts" for the the past hour into its own field named "pts"
6 - stats - collapses the two results for each sourcetype into a single result
7 - where - excludes the results for sourcetypes that don't meet the criteria
HTH!
Sorry about the typos - I updated my answer for future reference.
Thanks for sticking with it!
Here was the final version that worked for me, for reference:
query returning pts for each event earliest=-24h@h latest=@h
| bin _time span=1h as hour
| eval thisHour = if (hour <= relative_time(now(),"-1h@h"),1,0)
| stats sum(pts) as sum_pts by hour sourcetype thisHour
| stats max(sum_pts) as maxPts by sourcetype thisHour
| eval pts = if(thisHour==0,maxPts,0)
| eval maxPts = if(thisHour==1,maxPts,0)
| stats sum(maxPts) as maxPts sum(pts) as PtsThisHour by sourcetype
| where PtsThisHour >= (maxPts * 1.1)
Ok, there were just a couple of typos. One of the sourcetypes was spelled sourceType, and the maxPts and PtsThisHour were reversed, but other than that, works like a charm! Thank you so much!
Thanks, but I'm still having some trouble. I needed to put the "-1h@h" in quotes or it wouldn't evaluate, and also, it seems to break at step 4, at which point I get no results. I can't seem to fix it.
I've been trying to edit the original text of the question to make it clearer, but I can't get past the captcha...
That's not exactly what I'm looking for, as I'm actually looking for a specific log line that returns # of points processed per query, and that is the sum I am looking at, although I guess I can used index volume as a proxy for that, but that wouldn't be very exact.
This is a much faster way to get index volume by sourcetype:
index=_internal source=*metrics.log group=per_sourcetype_thruput series!=_* | eval totalGB = (kb/1024)/1024 | bin _time span=1h as hour |convert ctime(hour) as Hour| stats sum(totalGB) as totalGB by Hour series |rename series AS Sourcetype
I'm not sure how to compare one hour to the next, but I know it can be done.