Deployment Architecture

script to check if the server is up

nirt00
New Member

Hi, I new at Splunk and need some help.

I have two sites, primary with deployment server, heavy forwarder, and DR with deployment server and cluster master.

I'm trying to write a script that check if the primary site is up.

My line of thought is something like -

Primary -   if it up, I have TAG "primary".

DR -  As long as there is a TAG called "primary", I am "standby".
If not, then now I have the TAG called "primary" and at cluster master server enable the service.

Labels (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @nirt00,

you could create an alert that's searching if you're receiving Splunk internal logs from the servers in the primary site, something like this:

index=_internal (host=host_primary_1 OR host=host_primary_2 OR host=host_primary_3 OR ...)
| stats dc(host) AS hosts
| where hosts!=<number_of_host_in_primary_site>

Ciao.

Giuseppe

0 Karma

nirt00
New Member

Thanks.

how can I change the "TAGS" on the DR site (stanby) to "Primary" if the primary site is down?

can I do it in script? or this is the recommended solution?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @nirt00,

why should you change the tag?

usually the requisite is the continuity without knowing the source of the service.

Ciao.

Giuseppe

0 Karma

nirt00
New Member

Hi @gcusello ,

thanks for your reply.

let me understand, I have several splunk server in primary site and several server in DR site, if the primary site is down the DR site continues work regular?  I don't need to "TAG" it as primary? what regarding splunk service that down in primary site - I need to enable it on the DR site?

I apologize for all the questions, but I feel I lack information or knowledge.

BR

Nir

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @nirt00,

I don't know how your DR is organized, but I think that the recovery site starts to work automatically, without manual intervene.

Probably in the DR site you have the Indexers and Search Heads to continue the normal work and probably you have only some management roles (e.g. Deployer or Deployment Servers) in waiting state.

So if you receive an alert that the Primary Site is down, you have only to decide if you have to manually start (if needed) the waiting roles or wait for the restart the primary Site.

Ciao.

Giuseppe

0 Karma