I am trying to monitor deployer and search head service status using _internal logs.
Which fields should I consider to monitor whether Splunk service on deployer and SH are up and running?
Note: I am building a dashboard to monitor splunk service status
Hi @snigdhasaxena,
the main and easiest check that you can run is that the monitored hosts are sending internal logs, something like this:
create a lookup with all the hosts to monitor (called e.g. perimeter.csv and containing at least one field: host)
run a search like this:
| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count ]
| stats sum(count) AS total BY host
| where total=0
If you like you can use this search (without the last row) to display in a table an health status of your Splunk infrastructure, also in graphic mode.
this is an example of this dashboard with more informations in perimeter.csv:
<form stylesheet="table_decorations.css" script="table_icons_rangemap.js" hideFilters="true">
<label>Home Page</label>
<fieldset submitButton="false">
<input type="time" token="Time">
<label>periodo</label>
<default>
<earliest>-24h@h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<title>Infrastructure</title>
<table id="table1">
<title>Total = $server_count$</title>
<search>
<progress>
<set token="server_count">$job.resultCount$</set>
</progress>
<cancelled>
<unset token="server_count"></unset>
</cancelled>
<query>| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count site ip role]
| stats values(site) AS Site values(ip) AS IP values(role) AS Role sum(count) AS total BY host
| rangemap field=total severe=0-0 low=1-1000000000 default=severe
| eval range=if(check="no","elevated",range)
| rename host AS HostName
| table Role Site range
| sort Role</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>
You can find informations about "table_decorations.css" and "table_icons_rangemap.js" in Splunk dashboard Examples App ( https://splunkbase.splunk.com/app/1603/ ).
Ciao.
Giuseppe
Hi @snigdhasaxena,
the main and easiest check that you can run is that the monitored hosts are sending internal logs, something like this:
create a lookup with all the hosts to monitor (called e.g. perimeter.csv and containing at least one field: host)
run a search like this:
| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count ]
| stats sum(count) AS total BY host
| where total=0
If you like you can use this search (without the last row) to display in a table an health status of your Splunk infrastructure, also in graphic mode.
this is an example of this dashboard with more informations in perimeter.csv:
<form stylesheet="table_decorations.css" script="table_icons_rangemap.js" hideFilters="true">
<label>Home Page</label>
<fieldset submitButton="false">
<input type="time" token="Time">
<label>periodo</label>
<default>
<earliest>-24h@h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<title>Infrastructure</title>
<table id="table1">
<title>Total = $server_count$</title>
<search>
<progress>
<set token="server_count">$job.resultCount$</set>
</progress>
<cancelled>
<unset token="server_count"></unset>
</cancelled>
<query>| metasearch index=_internal [ | inputlookup perimeter.csv | fields host ]
| eval host=lower(host)
| stats count BY host
| append [ | inputlookup perimeter.csv | eval host=lower(host), count=0 | fields host count site ip role]
| stats values(site) AS Site values(ip) AS IP values(role) AS Role sum(count) AS total BY host
| rangemap field=total severe=0-0 low=1-1000000000 default=severe
| eval range=if(check="no","elevated",range)
| rename host AS HostName
| table Role Site range
| sort Role</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>
You can find informations about "table_decorations.css" and "table_icons_rangemap.js" in Splunk dashboard Examples App ( https://splunkbase.splunk.com/app/1603/ ).
Ciao.
Giuseppe
@gcusello Thanks for responding. The query that I have built is generating status report for all other Splunk components (indexer, search heads, deployment server) but is not fetching events for Cluster master and deployer.
I am looking for a field value parameter in internal logs that could help me determine the status of CM and deployer.
Hi @snigdhasaxena,
Check if you configured Cluster Master and Deployer to send logs to Indexers (it's a best practice that all the Splunk servers send their logs to Indexers!), if not, configure them to forward their logs.
Ciao.
Giuseppe