I have a query to find missing forwarders. It is based on code I received here and it is so very close to working. Here is the code and my issue is below. (fwdType=* group=tcpin_connections guid=* index=_internal sourcetype=splunkd (connectionType=cooked OR connectionType=cookedSSL))
| stats values(fwdType) as forwarder_type, latest(version) as version, values(arch) as arch, values(os) as os, max(_time) as last_connected, sum(kb) as new_sum_kb, avg(tcp_KBps) as new_avg_tcp_kbps, avg(tcp_eps) as new_avg_tcp_eps by guid, hostname
| stats values(forwarder_type) as forwarder_type, max(version) as version, values(arch) as arch, values(os) as os, max(last_connected) as last_connected, values(new_sum_kb) as sum_kb, values(new_avg_tcp_kbps) as avg_tcp_kbps, values(new_avg_tcp_eps) as avg_tcp_eps by guid, hostname
| addinfo
| eval status=if(((isnull(sum_kb) OR (sum_kb <= 0)) OR (last_connected < (info_max_time - 60))),"missing","active"), sum_kb=round(sum_kb,2), avg_tcp_kbps=round(avg_tcp_kbps,2), avg_tcp_eps=round(avg_tcp_eps,2)
| eval age = now() - last_connected
| search age > 60
| sort age d
| convert ctime(last_connected)
| rename hostname as host
| fields host, forwarder_type, version, arch, os, status, last_connected, age, sum_kb, avg_tcp_kbps, avg_tcp_eps
| join type=left host [| tstats count where index="*" by host, index | stats values(index) as indexes by host]
| append [| inputlookup MaintenanceToggle.csv |fields MaintenanceONOFF]|reverse|filldown MaintenanceONOFF |reverse|where MaintenanceONOFF!="ON"
| search NOT [ | inputlookup PreProduction.csv | rename Index as indexes ]
| search host=* NOT [ | inputlookup Forwarders.csv | rename IgnoreForwarder as host ] Everything up to the join is code I found here and appears to work perfectly. The result is all of the hosts where age > 60. This works every time. | join type=left host [| tstats count where index="*" by host, index | stats values(index) as indexes by host] The join is meant to link a host with its index. As my goal is to exempt certain indexes. The join appears to work but occasionally fails to link a host with the index, resulting in the proper results with a few hosts missing an index. Obviously all of the hosts have an index. I am fairly confident of the remaining logic as it is working well for many other queries. So the problem is that the join fails to join 100% of the time. I have this scheduled to run every five minutes. I can replicate the results using the exact time frame in which the query occurred and minutes later the exact same search will return properly. Can anyone suggest what might be the cause of this and how to correct? Thank you in advance.
... View more