Hi, I new at Splunk and need some help.
I have two sites, primary with deployment server, heavy forwarder, and DR with deployment server and cluster master.
I'm trying to write a script that check if the primary site is up.
My line of thought is something like -
Primary - if it up, I have TAG "primary".
DR - As long as there is a TAG called "primary", I am "standby".
If not, then now I have the TAG called "primary" and at cluster master server enable the service.
you could create an alert that's searching if you're receiving Splunk internal logs from the servers in the primary site, something like this:
index=_internal (host=host_primary_1 OR host=host_primary_2 OR host=host_primary_3 OR ...) | stats dc(host) AS hosts | where hosts!=<number_of_host_in_primary_site>
Hi @gcusello ,
thanks for your reply.
let me understand, I have several splunk server in primary site and several server in DR site, if the primary site is down the DR site continues work regular? I don't need to "TAG" it as primary? what regarding splunk service that down in primary site - I need to enable it on the DR site?
I apologize for all the questions, but I feel I lack information or knowledge.
I don't know how your DR is organized, but I think that the recovery site starts to work automatically, without manual intervene.
Probably in the DR site you have the Indexers and Search Heads to continue the normal work and probably you have only some management roles (e.g. Deployer or Deployment Servers) in waiting state.
So if you receive an alert that the Primary Site is down, you have only to decide if you have to manually start (if needed) the waiting roles or wait for the restart the primary Site.