Deployment Architecture

Check Deployer and search head status in internal logs

snigdhasaxena
Communicator

I am trying to monitor deployer and search head service status using _internal logs.
Which fields should I consider to monitor whether Splunk service on deployer and SH are up and running?

Note: I am building a dashboard to monitor splunk service status

Labels (1)
Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @snigdhasaxena,
the main and easiest check that you can run is that the monitored hosts are sending internal logs, something like this:
create a lookup with all the hosts to monitor (called e.g. perimeter.csv and containing at least one field: host)
run a search like this:

| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count ]
| stats sum(count) AS total BY host
| where total=0

If you like you can use this search (without the last row) to display in a table an health status of your Splunk infrastructure, also in graphic mode.
this is an example of this dashboard with more informations in perimeter.csv:

<form stylesheet="table_decorations.css" script="table_icons_rangemap.js" hideFilters="true">
  <label>Home Page</label>
  <fieldset submitButton="false">
    <input type="time" token="Time">
      <label>periodo</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Infrastructure</title>
      <table id="table1">
        <title>Total = $server_count$</title>
        <search>
          <progress>
            <set token="server_count">$job.resultCount$</set>
          </progress>
          <cancelled>
            <unset token="server_count"></unset>
          </cancelled>
          <query>| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
            | eval host=lower(host)
            | stats count BY host
            | append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count site ip role]
            | stats values(site) AS Site values(ip) AS IP values(role) AS Role sum(count) AS total BY host
            | rangemap field=total severe=0-0 low=1-1000000000 default=severe
            | eval range=if(check="no","elevated",range)
            | rename host AS HostName
            | table Role Site range
            | sort Role</query>
          <earliest>$Time.earliest$</earliest>
          <latest>$Time.latest$</latest>
        </search>
        <option name="count">100</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

You can find informations about "table_decorations.css" and "table_icons_rangemap.js" in Splunk dashboard Examples App ( https://splunkbase.splunk.com/app/1603/ ).

Ciao.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @snigdhasaxena,
the main and easiest check that you can run is that the monitored hosts are sending internal logs, something like this:
create a lookup with all the hosts to monitor (called e.g. perimeter.csv and containing at least one field: host)
run a search like this:

| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count ]
| stats sum(count) AS total BY host
| where total=0

If you like you can use this search (without the last row) to display in a table an health status of your Splunk infrastructure, also in graphic mode.
this is an example of this dashboard with more informations in perimeter.csv:

<form stylesheet="table_decorations.css" script="table_icons_rangemap.js" hideFilters="true">
  <label>Home Page</label>
  <fieldset submitButton="false">
    <input type="time" token="Time">
      <label>periodo</label>
      <default>
        <earliest>-24h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <title>Infrastructure</title>
      <table id="table1">
        <title>Total = $server_count$</title>
        <search>
          <progress>
            <set token="server_count">$job.resultCount$</set>
          </progress>
          <cancelled>
            <unset token="server_count"></unset>
          </cancelled>
          <query>| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
            | eval host=lower(host)
            | stats count BY host
            | append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count site ip role]
            | stats values(site) AS Site values(ip) AS IP values(role) AS Role sum(count) AS total BY host
            | rangemap field=total severe=0-0 low=1-1000000000 default=severe
            | eval range=if(check="no","elevated",range)
            | rename host AS HostName
            | table Role Site range
            | sort Role</query>
          <earliest>$Time.earliest$</earliest>
          <latest>$Time.latest$</latest>
        </search>
        <option name="count">100</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</form>

You can find informations about "table_decorations.css" and "table_icons_rangemap.js" in Splunk dashboard Examples App ( https://splunkbase.splunk.com/app/1603/ ).

Ciao.
Giuseppe

0 Karma

snigdhasaxena
Communicator

@gcusello Thanks for responding. The query that I have built is generating status report for all other Splunk components (indexer, search heads, deployment server) but is not fetching events for Cluster master and deployer.
I am looking for a field value parameter in internal logs that could help me determine the status of CM and deployer.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @snigdhasaxena,
Check if you configured Cluster Master and Deployer to send logs to Indexers (it's a best practice that all the Splunk servers send their logs to Indexers!), if not, configure them to forward their logs.

Ciao.
Giuseppe

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...