I understand how to actually set up an alert, but I'm having trouble figuring out how to format a search to alert off of. Basically I want to set up an alert for when our important clients fail to phone home after X amount of time, such as our domain controllers. Anyone give me a hand? We are using Splunk 6.1.
Took this query from Deployment Monitor App
index="_internal" source="metrics.log" group=tcpin_connections hostname=prdensp|
eval sourceHost=if(isnull(hostname), sourceHost,hostname) |
eval connectionType=case(fwdType=="uf","Universal Forwarder", fwdType=="lwf", "Light Weight Forwarder",fwdType=="full", "Splunk Indexer", connectionType=="cooked" or connectionType=="cookedSSL","Splunk Forwarder", connectionType=="raw" or connectionType=="rawSSL","Legacy Forwarder") |
eval build=if(isnull(build),"n/a",build) |
eval version=if(isnull(version),"pre 4.2",version) |
eval guid=if(isnull(guid),sourceHost,guid) |
eval arch=if(isnull(arch),"n/a",arch) |
eval my_splunk_server = splunk_server |
fields connectionType sourceIp sourceHost sourcePort destPort kb tcp_eps tcp_Kprocessed tcp_KBps my_splunk_server build version os arch |
eval lastReceived = if(kb>0, _time, null) |
stats first(sourceIp) as sourceIp first(connectionType) as connectionType first(sourcePort) as sourcePort first(build) as build first(version) as version first(os) as os first(arch) as arch max(_time) as lastConnected max(lastReceived) as lastReceived sum(kb) as kb avg(tcp_eps) as avg_eps by sourceHost |
stats first(sourceIp) as sourceIp first(connectionType) as connectionType first(sourcePort) as sourcePort first(build) as build first(version) as version first(os) as os first(arch) as arch max(lastConnected) as lastConnected max(lastReceived) as lastReceived first(kb) as KB first(avg_eps) as eps by sourceHost |
eval status = if(isnull(KB) or lastConnected<(info_max_time-900),"missing",if(lastConnected>(lastReceived+300) or KB==0,"quiet","active")) |
| where lastConnected < relative_time(now(), "-4h")
If your just looking for clients with forwarders that have connections to the deployment server this search should do it.
index=_internal PhoneHome | dedup host | stats count first(_time) AS lastTime by host | eval age=now()-lastTime | where age>360 | convert timeformat="%Y/%m/%d %T" ctime(*Time) | fields – count
If your looking for clients that are sending logs you can use one of the two metadata searches by @somesoni2 or this one.
Index=* | dedup host | stats count first(_time) AS lastTime by host | eval age=now()-lastTime | where age>360 | convert timeformat="%Y/%m/%d %T" ctime(*Time) | fields – count
I'm guessing all those important servers have Splunk forwarders installed and they send _internal data to Splunk. You can try any one of this
|metadata type=hosts index=_internal | eval age=now()-recentTime | where age > 1800 | eval TimeSinceLastEvent=tostring(age,"duration") | eval RecentTime=strftime(recentTime,"%+") | table host, RecentTime, age, TimeSinceLastEvent | metasearch index=_internal sourcetype=splunkd | eval host=mvindex(split(host,"-"),0)|stats latest(_time) as _time by host | eval age=now()-_time | where age > 1800 | eval TimeSinceLastEvent=tostring(now()-_time,"duration")
Use whatever search you have that shows they phoned home.
Use fill Null to set the fields to something like-
Your Search |fillnull value="NoData" clientAStatus field2 field3
Generate alerts when ClientAStatus = NoData.
The NoData Status will only be filled if no the search returns no results in the time frame you specified.
That is great! Definitely answers the question of what to alert off of. The other part of the question though is a search to find the phonehome time of specific hosts, or any hosts in general. At this point I'm basically starting from scratch on this functionality. I've found a phonehome search but it doesn't distinguish the actual hosts, just when the deployment server/indexers get phonehome'd to.