I m trying to get the hour per day which gets the most hits on my application over a month but having some issues to get the right data output.
I would like to get a table report which would have:
DAY1 HOURX MaxEventNumber
DAY2 HOURX MaxEventNumber
I tried the following queries but none of them work. I think I m missing something on splunk reporting concepts:
Any help would be greatly appreciated,
| stats count as hit by date_hour, date_mday | eventstats max(hit) as maxhit by date_mday | where hit=maxhit | fields - maxhit
I am not sure it will work. But it should figure out the max hits for each day, and only keep the events with that have have the maximum number.
New and better solution, which works for for all events - not just those with
yoursearchhere | timechart count as hit span=1h | eval date=strftime(_time, "%x") | eventstats max(hit) as maxhit by date | where hit=maxhit | fields - maxhit
I've a similar requirement but having trouble joining the search and subsearch fields with the where clause.
Can you please take a look at http://splunk-base.splunk.com/answers/72928/subsearch-busy-hour-cpu-usage ?
restating your problem to help clarify. I think you want to end up with a table where each day is only represented by one row, and that row represents the busiest hour during that day.
where the fields in the table are 'day of month', 'hour of day' and 'event count'
Here's the search you need:
<your search terms> | stats count by date_hour date_mday | sort - date_mday count | streamstats count as perMdayCount by date_mday | where perMdayCount=1
it's weird so lets walk it through step by step. You can run the query piece by piece and look at what the results look like at each pipe character (this is the best way to understand every complex search).
stats clause gives you the count of events rolled up by unique combinations of day-of-month and hour-of-day.
sort clause sorts it primarily by the day of month. But then within those the highest counts will be first in that subset.
Next we use
streamstats count, with a very important
by date_mday, to actually paint little numbers on each event. If
streamstats with a by clause makes your head hurt that's totally normal. Just play with it and you'll understand what it does. In this case since within each subset of a given day-of-month we just sorted the date_hours such that the highest count hour ends up first in the set, these highest count hours all get painted with a 'perMdayCount' field whose value is "1".
Which ends up being marvelously convenient because now we can just use a
where clause to match only rows where
perMdayCount is "1", and this conveniently throws away all the hours that were not the highest count hour in that given day.
Thank you very much both of you.
It seems both solutions work. Do you know which method is best practice or best from a performance perspective?
With multiple indexers and/or lots of data, I think that eventstats may be slightly more efficient.
You can always run both and compare - use the Search Job Inspector (the 'i' button in 4.3, and under the Actions menu in earlier versions).
mmm, old post!
I like to use a slightly different approach. It is very clear, and pretty cheap too. You can pipe one timechart into another, using different time resolutions. Like:
sourcetype=access_combined GET OR POST | timechart span=1h count as count | timechart span=1d max(count) as max
The first timechart gives you a count by hour, piped into the second one which pulls out only the biggest hour per day. The part you lose is what hour of the day was the busy hour. If which hour was the busiest hour is not important to you, then this is a quick and easy approach.