I think I might be doing something conceptually wrong with this, but I've tried several combinations throughout the day today, and not managed to get it right.
I'm configuring a Service KPI. The intent is to count total active&standby core routers on a network, using the presence of some occasional tunnel traffic.
Here's the search I'm using for the KPI, with a "distinct count" calculation (with some real data masked into XXXX,YYYY,ZZZZ):
index=XXXX host=YYYY sourcetype=ZZZZ
| fields host
| bucket _time span=20m
The correct answer is "33". Adding things like stats distinct_count(host) and timechart span=20m (etc), I can see 33 active hosts within the 20m window when using normal searches, consistently, for the past year.
But when I use the quoted search (and other variants) as a KPI, it always shows "0". I feel like my fundamental approach is wrong, but I can't hit the nail on the head, and could do with some pointers/ideas!
Creating a KPI search can be tricky. I recommend creating a search in the Search Bar which ends with "| stats ", because this is essentially how the KPI functions. Then copy the search (without the "| stats ..." to use as the ad-hoc search for your KPI. Your example search includes " | bucket _time span=20m", which doesn't really make sense if it is followed by "| stats ...".
Let's assume that the following search would produce the results you want for the most recent 20min period:
index=XXXX host=YYYY sourcetype=ZZZZ earliest=-20m latest=now
| stats dc(host)
You could convert this into a KPI by using the following as the ad-hoc search:
index=XXXX host=YYYY sourcetype=ZZZZ earliest=-20m latest=now
And then set the "threshold field" as 'host', set the aggregate calculation to Distinct Count, and set the KPI schedule to "every 5 minutes". This would create a KPI which updates every 5 min, with overlapping 20 min search periods. Or you could use non-overlapping periods as long as you are OK with 1, 5 or 15 min periods (which are the available scheduled search intervals). In essence, the KPI is "bucketing" your results by running the search on a scheduled basis.
I hope this helps.
Creating a KPI search can be tricky. I recommend creating a search in the Search Bar which ends with "| stats ", because this is essentially how the KPI functions. Then copy the search (without the "| stats ..." to use as the ad-hoc search for your KPI. Your example search includes " | bucket _time span=20m", which doesn't really make sense if it is followed by "| stats ...".
Let's assume that the following search would produce the results you want for the most recent 20min period:
index=XXXX host=YYYY sourcetype=ZZZZ earliest=-20m latest=now
| stats dc(host)
You could convert this into a KPI by using the following as the ad-hoc search:
index=XXXX host=YYYY sourcetype=ZZZZ earliest=-20m latest=now
And then set the "threshold field" as 'host', set the aggregate calculation to Distinct Count, and set the KPI schedule to "every 5 minutes". This would create a KPI which updates every 5 min, with overlapping 20 min search periods. Or you could use non-overlapping periods as long as you are OK with 1, 5 or 15 min periods (which are the available scheduled search intervals). In essence, the KPI is "bucketing" your results by running the search on a scheduled basis.
I hope this helps.
Thanks a lot, that was a very clear description. By using earliest= and latest=, I didn't need to confuse my Ad Hoc search with the other elements I was trying to use.
I'm assuming your querying the itsi_summary
and the data has already been processed by ITSI? What frequency is ITSI running for this service? You're trying to count the host values in the isti summary index? Since its writing its data to a summary index, the host values will just show your ITSI search heads. You should use splunk_server
field if you want the hosts that were queried prior to it filling the summary index