I want to check every hour iIf my forwarders are sending data constantly to my indexer, to setup an alert.
I am using metrics.log to measure the thruput traffic but it returns only the last hosts sending data, the one not sending data are not even mentioned.
How to compare that to the complete list of my hosts ?
Can I use metadata, or summary searches, lookups ?
Hi, you can use anything you want, but a simple method is to store a recent list of your hosts in a lookup table.
EDIT : metrics.log only store results for the top 10 hosts/source/sourcetype, this solution is not suitable for large number of hosts/forwarders.
schedule one search running weekly and to generate a lookup table with the list of hosts (with in that case the sum of all the traffic of the week)
index=_internal group="per_host_thruput" earliest=-14d@d latest=-7d@d | eval mb=kb/1024 | stats sum(mb) as traffic_mb_lastweek by series | outputlookup traffic_mb_lastweek.csv
schedule a search running every hour, after 5 minutes of delay (cron = 5 * * * *) and comparing to the list of the lookup.
index=_internal group="per_host_thruput" earliest=-1h@d latest=@h | eval mb=kb/1024 | stats sum(mb) as traffic_mb by series | appendcols [|inputlookup traffic_mb_lastweek.csv]
then you can add some conditions depending of your thresholds, by example look for hosts with no traffic if the traffic is usually significant.
| Where isnull(traffic_mb) AND traffic_mb_lastweek > 1
This is interesting but how about the number of servers that appear in the forwarder management list but have never sent data? They have a forwarder but since they are not generating the data we look for (via white lists) we never can report on them. Can Splunk not retrieve the forwarder management list and use that for lookup?
Hi, you can use anything you want, but a simple method is to store a recent list of your hosts in a lookup table.
EDIT : metrics.log only store results for the top 10 hosts/source/sourcetype, this solution is not suitable for large number of hosts/forwarders.
schedule one search running weekly and to generate a lookup table with the list of hosts (with in that case the sum of all the traffic of the week)
index=_internal group="per_host_thruput" earliest=-14d@d latest=-7d@d | eval mb=kb/1024 | stats sum(mb) as traffic_mb_lastweek by series | outputlookup traffic_mb_lastweek.csv
schedule a search running every hour, after 5 minutes of delay (cron = 5 * * * *) and comparing to the list of the lookup.
index=_internal group="per_host_thruput" earliest=-1h@d latest=@h | eval mb=kb/1024 | stats sum(mb) as traffic_mb by series | appendcols [|inputlookup traffic_mb_lastweek.csv]
then you can add some conditions depending of your thresholds, by example look for hosts with no traffic if the traffic is usually significant.
| Where isnull(traffic_mb) AND traffic_mb_lastweek > 1
You can also use |metadata command to get the list of forwarders sending to indexer and when was the last time they sent data. Note: if you're interested in knowing the amount of data being sent at an interval, this will not work. It shows the total amount of data.
|metadata type=hosts index=* | convert ctime(lastTime) as lastTime ctime(recentTime) as recentTime ctime(firstTime) as firstTime
Then do not use metrics, and instead search the actual data,
It can a higher load, so restrict well the timerange.
What is the solution then for a larger number of hosts?
You are right, the metrics.log only store metrics store the top 10 hosts. This is search will not work for a large deployment.0
Will this work as expected? According to gkanapathy, per_host_thruput only has the top 10 hosts: http://splunk-base.splunk.com/answer_link/23635/