I have a query to calculate some hourly stats like
index=txndata | bucket _time span=1h | stats count as Volume, median(txnDuration) as MedianDuration, max(txnDuration) as MaximumDuration by _time, txnType
This gets put in another index called txnSummary
From this I need to know what the max Volume was per day (or other time period) and which hour it occured in. Similarly for max duration and max median duration.
Max volume I can calculate like
index=txnSummary earliest=-1d@d | bucket _time span=1h | stats count as Volume by _time, txnType | stats max(Volume) by txnType
That gets me the value but not the time it occured in.
To get the hour is the best thing to do a join like
index=txnSummary earliest=-1d@d | bucket _time span=1h | stats count as Volume by _time, txnType | join type=inner Volume,txnType [search index=txnSummary earliest=-1d@d | bucket _time span=1h | stats count as Volume by _time, txnType | stats max(Volume) as Volume by txnType]
and to pick just one then I can
| dedup txnType
This is similar to what you'd need to do in SQL. Is this the best approach in splunk?
I found another possibility by using eventstats but this may be a bit dodgy
index=txnSummary earliest=-1d@d | bucket _time span=1h | stats count as Volume by _time, txnType | eventstats max(Volume) as result by txnType | where result = Volume | dedup txnType
I guess eventstats must be doing a similar subquery behind the scenes. The advantage to me is that I don't need to repeat the whole search in the subsearch, but it does seem like a hack.
Any other options?
Hi, maybe I misunderstood something in your question, but to me the answer is much simpler;
... | bucket _time span=1h | stats count as Volume by _time, txnType | sort - Volume | dedup txnType
This will give you a table of time
, txnType
and Volume
, with only the top 1 count (Volume
) per txnType
.
time txnType Volume
12/12/2011 05:00:00 B 102
12/12/2011 11:00:00 A 89
12/12/2011 08:00:00 C 73
12/12/2011 01:00:00 D 12
Hope this helps,
Kristian
Hi, maybe I misunderstood something in your question, but to me the answer is much simpler;
... | bucket _time span=1h | stats count as Volume by _time, txnType | sort - Volume | dedup txnType
This will give you a table of time
, txnType
and Volume
, with only the top 1 count (Volume
) per txnType
.
time txnType Volume
12/12/2011 05:00:00 B 102
12/12/2011 11:00:00 A 89
12/12/2011 08:00:00 C 73
12/12/2011 01:00:00 D 12
Hope this helps,
Kristian
Thanks Kristian. That's much better. I hadn't thought of using dedup that way