Hey @skoelpin,
Setting up an alert will help if it's only one forwarder/host/sourcetype you have complains with. It will get cumbersome if you have 1000s of machines doing the same randomly(Especially when you do not know which one might skip/break).
Use this query to monitor the forwarder status that are reporting to your instance
Run for hosts that are deviated or look suspicious
Have that data written to an event
Build Alerts on those events. It is not going to be easy 🙂 I have done something similar. Atleast i now know that the triggered alerts are denoting a data forwarding issue.
index=_internal source=*metrics.log group=tcpin_connections
| eval sourceHost=if(isnull(hostname), sourceHost,hostname)
| rename connectionType as connectType
| eval connectType=case(fwdType=="uf","univ fwder", fwdType=="lwf", "lightwt fwder",fwdType=="full", "heavy fwder", connectType=="cooked" or connectType=="cookedSSL","Splunk fwder", connectType=="raw" or connectType=="rawSSL","legacy fwder")
| eval version=if(isnull(version),"pre 4.2",version)
| rename version as Ver
| fields connectType sourceIp sourceHost destPort kb tcp_eps tcp_Kprocessed tcp_KBps splunk_server Ver
| eval Indexer= splunk_server
| eval Hour=relative_time(_time,"@h")
| stats avg(tcp_KBps) sum(tcp_eps) sum(tcp_Kprocessed) sum(kb) by Hour connectType sourceIp sourceHost destPort Indexer Ver
| fieldformat Hour=strftime(Hour,"%x %H")
Hope this helps and please do not forget to post if you find a better solution.
Thanks,
Raghav
... View more