We recently made a copy of our production environment. We are running in AWS. An individual made copies of each prod server ami and stood up identical hosts in another zone. Thus, none of the splunk configurations were changed when the test hosts initially came on line. I went through later and updated all of the test hosts with their own hostnames etc in the splunk config files.
We have a total of 17 hosts forwarding events to the indexing server. Same server where DM and primary Web Console is installed. DM only shows 13, and some of those are duplicates. 10 in DM if you don't count the duplicates.
I renamed every host and the requisite splunk configs after the test systems were created. They are all now using their FQDNs.
1) We used to have a host named "CoreCommandServer". I've since renamed it to use its FQDN. There is a test instance and a prod instance. DM is showing only the test instance (based on source IP address) and the name of that instance, in DM, is "CoreCommandServer", not the FQDN.
1a) Why is it not using the FQDN?
1b) Where is the prod instance? Events are coming in, but DM isn't listing it.
2) There are three servers which each show up twice in the DM list of forwarders. Identical names and source IP addresses. 2 of the 3 names are the FQDN, 1 is the old non-FQDN hostname.
I'm wondering how much timing has to do with this. IF that's the case, DM seems pretty fragile.
Either way, assistance would be hugely appreciated. I would like to use DM alerting to tell me when a forwarder disappears.
This is a search that I stole from the Deployment Monitor, and then modified/simplified. It identifies "missing" forwarders by comparing a list of forwarders from the past week with a list of the forwarders from today:
index=_internal source=*metrics.log group="tcpin_connections" earliest=@d | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | stats sum(kb) as KB_today by sourceHost | eval KB_today = round(KB_today) | join type=outer sourceHost [search index=_internal source=*metrics.log group="tcpin_connections" earliest=-7d@d latest=@d | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | stats sum(kb) as KB_lastweek by sourceHost | eval KB_lastweek = round(KB_lastweek) ] | eval Missing = if (KB_today < 1, "Missing", " ")
You could change this to an alert by adding
| where KB_today < 1
which would only list the "missing" forwarderer and then alert based on number of results > 0.
The Deployment Monitor has some definite weaknesses. But it is a great source for alert ideas...