Hello,
I would like to know which of my host have an increase in their event number compared to usual.
I first thought of comparing the number of events of my host during a minute with the average of events per minute on the last 30 days of these host.
However I do not know how to do it at all.
I have seen answers about:
https://answers.splunk.com/answers/226668/alert-when-there-is-a-x-increase-in-all-events-dur.html
But I can not get through,
Would you please advise me?
thank you very much
Hi there,
Example using tstats below.
| tstats count WHERE earliest=-60m@m latest=now BY _time, host span=1m
| streamstats stdev(count) as stdev by host window=5 current=t global=f
| stats max(stdev) as stdev by host
| sort - stdev | head 10
Looking at last 60 minutes indexes stats (for all default indexes for current user; you could specify index=* in WHERE instead).
First, "streamstats" is used to compute standard deviation every 5 minutes for each host (window=5 specify how many results to use per streamstats iteration).
Then, "stats" returns the maximum 'stdev' value by host.
Finally, results are sorted and we keep only 10 lines.
Example using 'tstats' but it works with normal search also.
The goal is to count number of events per hour between now and yesterday but display yesterday results as an overlay of todays' (you can change the time period if you want).
'addinfo' gets Search infos as result fields. I get "latest" value with it (info_max_time).
'eval' modifies "_time" when "date_wday" doesn't match "info_max_time" weekday.
The modification is about settings all "_time" on a "one day" time period (info_max_time day).
If my explanation doesn't make sens please don't hesitate to tell me, I'll try to rephrase.
| tstats count WHERE host=127.0.0.1 earliest=-1d latest=now BY _time, date_wday span=1h | addinfo | eval _time=if(match(date_wday, lower(strftime(info_max_time, "%A"))), _time, relative_time(info_max_time, "@d")+(_time-relative_time(_time, "@d"))) | timechart max(count) by date_wday fixedrange=f
Cheers,
@slapie
Hi,
Yes you search is much more efficient but in my case I'm comparing "volume" not "eventcount". Also I'm comparing today with the average volume in the last X days (where X can be whatever I want), comparing today with yesterday it's not really pertinent in my case especially when today=monday. And the last thing is I can specify a token to have this comparaison done by sourcetype.
How can I still do this with tstats ?
3no.
Hi,
The search I posted is meant as an example that you should adapt to your needs.
The main idea is: Use a single time period instead of sub-searches.
(sorry it wasn't clear as I replied to you in two different comments)
For your search I'm recommending using an existing report from the Monitoring Console like "DMC License Usage Data Cube".
You can activate report acceleration and instead of using "tstats" use "savedsearch" command (http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/savedsearch ).
Also you could remove weekends from your results using date_wday field as search criteria if you want.
Cheers,
Stéphane
Hi there,
Example using tstats below.
| tstats count WHERE earliest=-60m@m latest=now BY _time, host span=1m
| streamstats stdev(count) as stdev by host window=5 current=t global=f
| stats max(stdev) as stdev by host
| sort - stdev | head 10
Looking at last 60 minutes indexes stats (for all default indexes for current user; you could specify index=* in WHERE instead).
First, "streamstats" is used to compute standard deviation every 5 minutes for each host (window=5 specify how many results to use per streamstats iteration).
Then, "stats" returns the maximum 'stdev' value by host.
Finally, results are sorted and we keep only 10 lines.
Hi rflouquet,
That's a search I use to check if one of my sourcetype is sending me (in an hour) more data than it should compared to the last 3 days.
index=_internal source=*license_usage.log type=Usage st="*"
| eval h=strftime(_time, "%H:00")
| stats sum(b) as bytes by h
| eval Ko = round(bytes/1024,2)
| streamstats sum(Ko) as CKo
| chart first(CKo) as CKo over h
| join h type=left [ search index=_internal source=*license_usage.log type=Usage st="*" [ | stats count | eval earliest=strftime(relative_time(now(), "-3d"), "%m/%d/%Y:00:00:00") | return earliest ] [ | stats count | eval latest=strftime(relative_time(now(), "@d"), "%m/%d/%Y:00:00:00") | return latest ] | eval d=strftime(_time, "%Y-%m-%d") | eval h=strftime(_time, "%H:00") | stats sum(b) as bytes by h | eval Ko=round(bytes/1024,2) | streamstats sum(Ko) as CKo | eventstats sum(CKo) as CKo by h, d | stats sum(CKo) as CKo_15 by h | eval CKo_15=round(CKo_15/3, 2) ]
| fillnull value=0
| rename CKo as CGo, CKo_15 as CGo_15
| eval CGo=round(CGo/1024/1024, 2)
| eval CGo_15=round(CGo_15/1024/1024, 2)
| stats sum(*) as * by h
| rename h as Hour, CGo as "GB sum", CGo_15 as "Last 3 day GB"
What you can do is add a token to define the sourcetype.
You'll need to replace the "3 days" by "30", the "hour" by "minute", and the "sourcetype" by the "host" to match your requierement but at least you have some base to work on.
I hope this can help you build your search...
3no.
Hi,
You should avoid using "join" command. It's not very efficient.
Instead you can search on the whole time period (earliest=-3d latest=now), categorise events based on date_* fields, do the stats and then for display purposes modify _time for -3d events to match @d timeline (I'll show an example in main thread).
Also license_usage.log and metrics.log are used in the Monitoring Console within several existing reports.
You could activate acceleration on those reports and base your search on their results (using "savedsearch" command).
Cheers,
Thanks a lot @3no ! For the moment the search does not work in the current state but I try to have fun with it !
In the comments, slapie correctly points out that, if you have no other criteria than host and _time for selecting the events in question, that lines 1-3 in the examples can be replaced by the much much faster
| tstats count by _time, host span=1m
| rename _time as Minute, count as countMinute
This version calculates the average count per minute for each day for each host, and then gives you the hosts that have the highest increase...
(your query that produces the events you care about, with the fields _time and host....)
| bin _time as Minute span=1m
| stats count as countMinute by host Minute
| bin Minute as Day span=1d
| stats avg(countMinute ) as countDay by host Day
| eventstats max(Day) as lastDay, avg(countDay) as countAvg, stdev(countDay) as countStdev by host
| where ((Day==lastDay) AND (countDay>countAvg))
| eval increaseStdev=round(((countDay-countAvg)/countStdev),2)
| eval increasePct=round(100*(countDay-countAvg)/countAvg,2)
| top 10 increasePct
That will tell you increase in terms of absolute percentage, and in terms of standard deviations. You can play with those two and see which one gives you a better feel for which hosts are experiencing significant increases in traffic.
If you are more interested in peak event count, you could instead count the top 30 event counts per day, and use that for your calculations, such as the following...
(your query that produces the events you care about, with the fields _time and host....)
| bin _time as Minute span=1m
| stats count as countMinute by host Minute
| bin Minute as Day span=1d
| top 30 countMinute by host Day
| stats avg(countMinute) as countTop30Day by host Day
| eventstats max(Day) as lastDay, avg(countTop30Day) as countTop30Avg, stdev(countTop30Day) as countTop30Stdev by host
| where ((Day==lastDay) AND (countTop30Day>countTop30Avg))
| eval increaseTop30Stdev=round(((countTop30Day-countTop30Avg)/countTop30Stdev),2)
| eval increaseTop30Pct=round(100*(countTop30Day-countTop30Avg)/countTop30Avg,2)
| top 10 increaseTop30Pct
Warning - 30 days times 24 hours time 60 minutes is 43200 records per host if you are calculating at the per-minute basis. If you are going to do this a lot, then you might want to consider calculating a summary index once a day for the prior day's minute-by-minute activity, to reduce the overhead of summarizing all that data every time you run this report.
updated second "bin" command to use Minute, which exists at that point, rather than _time, which does not.
Thanks a lot, @DalJeanis ! But I tried 4 time and none of your search give me back something !
I'm trying to figured it out, thanks again
You could start your search using "tstats" instead (docs.splunk.com/Documentation/Splunk/latest/SearchReference/tstats ).
Example:
| tstats count by _time, host span=4h | streamstats stdev(count) by host window=4
Good point - assuming that event count by host per unit time
is the only criteria, and that there are no other limitations on the events selected.
I explicitly began my code with the opposite assumption:
(your query that produces the events you care about, with the fields _time and host....)
so, in that limited case, lines 1-3 can be replaced by...
| tstats count by _time, host span=1m
| rename _time as Minute, count as countMinute
By the way, I'm not sure why you selected 4h as the span, when the OP specifically refers to by minute
.
Updated code, try again. The second bin
command was again attempting to bin _time
as Day
, but _time
had been eliminated by the prior stats
command. Recoded to bin Minute
instead.
Thank you very much @DalJeanis , it works finally! The only point is that the percent does not work and still displays 100,000
LIne 10 is multiplying by 100 then rounding to 2 additional digits. You can round to 0 additional digits by removing the ",2", then concatenate a "%" onto the end of it it you'd prefer.
Btw you could use percentiles (perc eval function) instead of calculating percentages to filter the result.
But I believe calculating percentages on standard deviations is unnecessary, isn't it?
Yes, you could. However, without seeing rflouquet's data, I have no idea what "percentile" to suggest as a good line in the sand.
I provided both percentages (which rflouquet requested) and stdev (which seems more useful to me) and explained, "You can play with those two and see which one gives you a better feel for which hosts are experiencing significant increases in traffic."
Can't get much clearer than that.