Alerting

Setup an alert on host or cluster level based on the event count

iqbalintouch
Path Finder

This is my base query:

index=myindex sourcetype=xyz host="tus" "EventLogger*" AND "Search event" "pcrState=N"

I want to setup an alert for search count on any host or cluster not giving any result in 10 mins.

There are 15 hosts i.e. host= tus1,tus2,tus3........tus15 and all these hosts are setup in a cluster (3 host in a single cluster)
Cluster1 consist of tus1,tus2 & tus3 and cluster2 consist of tus4,tus5 & tus6 and so on...

So if the count is 0 on a single host or any particular cluster, alert should be triggered.

0 Karma
1 Solution

niketn
Legend

Based on the data provided you can create a lookup file for host-cluster mapping and upload to Splunk for enrichment of data and finding missing hosts that did not log in time window. Following is documentation on uploading lookup definition file as csv: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usefieldlookupstoaddinformationtoyourev...

Option 1 Use one to many mapping of each cluster to several hosts i.e. cluster_host_mapping.csv ( Advantage: Lookup file will have less number of rows. Disadvantage: host values would not be easily available for lookup definition base search filter as it has to have wildcard pattern match.) :

cluster  host
cluster1    tus1,tus2,tus3
cluster2    tus4,tus5,tus6
cluster3    tus7,tus8,tus9
cluster4    tus10,tus11,tus12
cluster5    tus13,tus14,tus15

Once cluster_host_mapping.csv lookup file is uploaded to Splunk, you can add the following to your existing search

index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N"
| stats max(_time) as _time count by host
| eval dataSource="events"
| append
    [| inputlookup cluster_host_mapping.csv 
    | makemv host delim=","
    | mvexpand host
    | eval count=0, _time=now(), dataSource="lookup"]
|  stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
|  search eventCount=0
|  sort cluster host
|  fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
|  table cluster host triggerTime eventCount

Option 2: one to one mapping of hosts with respective clusters in lookup file for example host_cluster_mapping.csv ( Advantage: you can create one to one mapping as well for creating Lookup Definition for host. Disadvantage: More number of rows in lookup file)

cluster  host
cluster1    tus1
cluster1    tus2
cluster1    tus3
cluster2    tus4
cluster2    tus5
cluster2    tus6
cluster3    tus7
cluster3    tus8
cluster3    tus9
cluster4    tus10
cluster4    tus11
cluster4    tus12
cluster5    tus13
cluster5    tus14
cluster5    tus15

The you can try a search like the following to get a list of hosts/clusters not reported in last 10 min.:

| inputlookup host_cluster_mapping.csv where NOT [
     index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N" earliest=-10m latest=now()
    | dedup host 
    | table host 
    | format]

Please try out one of the approaches and confirm!

Following is a run anywhere example based on both approaches. PS: outputlookup commands in the two searches have been commented out. You can upload the file manually or un-comment the outputlookup command based on which test machine you are testing the run anywhere dashboard (DO NOT try out run anywhere dashboard with outputlookup command in your production environment, you should plug in your own code as stated above).

alt text

Following is the simple XML dashboard code for screenshot attached:

<dashboard>
  <label>Alert For Missing Host Custer</label>
  <row>
    <panel>
      <title>One to many Cluster to Host mapping UNCOMMENT in Query `|  outputlookup cluster_host_mapping.csv`</title>
      <table>
        <search>
          <query>|  makeresults
|  eval data="cluster=cluster1;host=\"tus1,tus2,tus3\"|cluster=cluster2;host=\"tus4,tus5,tus6\"|cluster=cluster3;host=\"tus7,tus8,tus9\"|cluster=cluster4;host=\"tus10,tus11,tus12\"|cluster=cluster5;host=\"tus13,tus14,tus15\""
|  makemv data delim="|"
|  mvexpand data
|  rename data as _raw
|  KV
|  table cluster host
<!-- Uncomment the following outputlookup to create cluster_host_mapping.csv lookup file
|  outputlookup cluster_host_mapping.csv
-->
          </query>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <title>One to One Host Cluster Mapping UNCOMMENT in Query `|  outputlookup host_cluster_mapping.csv`</title>
      <table>
        <search>
          <query>|  makeresults
|  eval data="cluster=cluster1;host=\"tus1,tus2,tus3\"|cluster=cluster2;host=\"tus4,tus5,tus6\"|cluster=cluster3;host=\"tus7,tus8,tus9\"|cluster=cluster4;host=\"tus10,tus11,tus12\"|cluster=cluster5;host=\"tus13,tus14,tus15\""
|  makemv data delim="|"
|  mvexpand data
|  rename data as _raw
|  KV
|  makemv host delim=","
|  mvexpand host
|  table host cluster
<!-- Uncomment the following outputlookup to create host_cluster_mapping.csv lookup file
|  outputlookup host_cluster_mapping.csv
-->
          </query>
          <earliest>0</earliest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Option 1: Alert with one to many cluster to host mapping i.e. cluster_host_mapping.csv</title>
      <table>
        <search>
          <query>| makeresults 
| eval host="tus1,tus1,tus2,tus3,tus4,tus4,tus15" 
| makemv host delim="," 
| mvexpand host
| append 
    [| makeresults 
| eval host="tus1,tus1,tus1,tus2,tus2,tus3,tus4,tus14,tus15" 
| makemv host delim=","
| mvexpand host ]
| stats max(_time) as _time count by host
| eval dataSource="events"
| append
    [| inputlookup cluster_host_mapping.csv 
    | makemv host delim=","
    | mvexpand host
    | eval count=0, _time=now(), dataSource="lookup"]
|  stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
|  search eventCount=0
|  sort cluster host
|  fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
|  table cluster host triggerTime eventCount</query>
          <earliest>-10m</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <title>Option 2: Alert with one to one host to cluster mapping i.e. host_cluster_mapping.csv</title>
      <table>
        <search>
          <query>| inputlookup host_cluster_mapping.csv where NOT 
    [| makeresults 
| eval host="tus1,tus1,tus2,tus3,tus4,tus4,tus15" 
| makemv host delim="," 
| mvexpand host 
| append 
    [| makeresults 
    | eval host="tus1,tus1,tus1,tus2,tus2,tus3,tus4,tus14,tus15" 
    | makemv host delim="," 
    | mvexpand host ] 
| dedup host 
| table host
| format]</query>
          <earliest>-10m</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</dashboard>
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

Based on the data provided you can create a lookup file for host-cluster mapping and upload to Splunk for enrichment of data and finding missing hosts that did not log in time window. Following is documentation on uploading lookup definition file as csv: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usefieldlookupstoaddinformationtoyourev...

Option 1 Use one to many mapping of each cluster to several hosts i.e. cluster_host_mapping.csv ( Advantage: Lookup file will have less number of rows. Disadvantage: host values would not be easily available for lookup definition base search filter as it has to have wildcard pattern match.) :

cluster  host
cluster1    tus1,tus2,tus3
cluster2    tus4,tus5,tus6
cluster3    tus7,tus8,tus9
cluster4    tus10,tus11,tus12
cluster5    tus13,tus14,tus15

Once cluster_host_mapping.csv lookup file is uploaded to Splunk, you can add the following to your existing search

index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N"
| stats max(_time) as _time count by host
| eval dataSource="events"
| append
    [| inputlookup cluster_host_mapping.csv 
    | makemv host delim=","
    | mvexpand host
    | eval count=0, _time=now(), dataSource="lookup"]
|  stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
|  search eventCount=0
|  sort cluster host
|  fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
|  table cluster host triggerTime eventCount

Option 2: one to one mapping of hosts with respective clusters in lookup file for example host_cluster_mapping.csv ( Advantage: you can create one to one mapping as well for creating Lookup Definition for host. Disadvantage: More number of rows in lookup file)

cluster  host
cluster1    tus1
cluster1    tus2
cluster1    tus3
cluster2    tus4
cluster2    tus5
cluster2    tus6
cluster3    tus7
cluster3    tus8
cluster3    tus9
cluster4    tus10
cluster4    tus11
cluster4    tus12
cluster5    tus13
cluster5    tus14
cluster5    tus15

The you can try a search like the following to get a list of hosts/clusters not reported in last 10 min.:

| inputlookup host_cluster_mapping.csv where NOT [
     index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N" earliest=-10m latest=now()
    | dedup host 
    | table host 
    | format]

Please try out one of the approaches and confirm!

Following is a run anywhere example based on both approaches. PS: outputlookup commands in the two searches have been commented out. You can upload the file manually or un-comment the outputlookup command based on which test machine you are testing the run anywhere dashboard (DO NOT try out run anywhere dashboard with outputlookup command in your production environment, you should plug in your own code as stated above).

alt text

Following is the simple XML dashboard code for screenshot attached:

<dashboard>
  <label>Alert For Missing Host Custer</label>
  <row>
    <panel>
      <title>One to many Cluster to Host mapping UNCOMMENT in Query `|  outputlookup cluster_host_mapping.csv`</title>
      <table>
        <search>
          <query>|  makeresults
|  eval data="cluster=cluster1;host=\"tus1,tus2,tus3\"|cluster=cluster2;host=\"tus4,tus5,tus6\"|cluster=cluster3;host=\"tus7,tus8,tus9\"|cluster=cluster4;host=\"tus10,tus11,tus12\"|cluster=cluster5;host=\"tus13,tus14,tus15\""
|  makemv data delim="|"
|  mvexpand data
|  rename data as _raw
|  KV
|  table cluster host
<!-- Uncomment the following outputlookup to create cluster_host_mapping.csv lookup file
|  outputlookup cluster_host_mapping.csv
-->
          </query>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <title>One to One Host Cluster Mapping UNCOMMENT in Query `|  outputlookup host_cluster_mapping.csv`</title>
      <table>
        <search>
          <query>|  makeresults
|  eval data="cluster=cluster1;host=\"tus1,tus2,tus3\"|cluster=cluster2;host=\"tus4,tus5,tus6\"|cluster=cluster3;host=\"tus7,tus8,tus9\"|cluster=cluster4;host=\"tus10,tus11,tus12\"|cluster=cluster5;host=\"tus13,tus14,tus15\""
|  makemv data delim="|"
|  mvexpand data
|  rename data as _raw
|  KV
|  makemv host delim=","
|  mvexpand host
|  table host cluster
<!-- Uncomment the following outputlookup to create host_cluster_mapping.csv lookup file
|  outputlookup host_cluster_mapping.csv
-->
          </query>
          <earliest>0</earliest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Option 1: Alert with one to many cluster to host mapping i.e. cluster_host_mapping.csv</title>
      <table>
        <search>
          <query>| makeresults 
| eval host="tus1,tus1,tus2,tus3,tus4,tus4,tus15" 
| makemv host delim="," 
| mvexpand host
| append 
    [| makeresults 
| eval host="tus1,tus1,tus1,tus2,tus2,tus3,tus4,tus14,tus15" 
| makemv host delim=","
| mvexpand host ]
| stats max(_time) as _time count by host
| eval dataSource="events"
| append
    [| inputlookup cluster_host_mapping.csv 
    | makemv host delim=","
    | mvexpand host
    | eval count=0, _time=now(), dataSource="lookup"]
|  stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
|  search eventCount=0
|  sort cluster host
|  fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
|  table cluster host triggerTime eventCount</query>
          <earliest>-10m</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="refresh.display">progressbar</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
    <panel>
      <title>Option 2: Alert with one to one host to cluster mapping i.e. host_cluster_mapping.csv</title>
      <table>
        <search>
          <query>| inputlookup host_cluster_mapping.csv where NOT 
    [| makeresults 
| eval host="tus1,tus1,tus2,tus3,tus4,tus4,tus15" 
| makemv host delim="," 
| mvexpand host 
| append 
    [| makeresults 
    | eval host="tus1,tus1,tus1,tus2,tus2,tus3,tus4,tus14,tus15" 
    | makemv host delim="," 
    | mvexpand host ] 
| dedup host 
| table host
| format]</query>
          <earliest>-10m</earliest>
          <latest>now</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="count">20</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">none</option>
        <option name="percentagesRow">false</option>
        <option name="rowNumbers">false</option>
        <option name="totalsRow">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</dashboard>
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

iqbalintouch
Path Finder

@niketnilay thank you for the detailed explanation. I don't need a .csv file with lookup OR dashboard, can you please help me out with simple search query with eval, and trigger the alert when search eventCount=0 in last 10 mins.

if there is no way, I will go with the above solution because I am kind of beginner of Splunk tool 🙂

appreciate your help!

0 Karma

niketn
Legend

You can definitely do it through Search (to be setup as alert). The dashboard was just an example for you to try out.

You would however need a lookup file or some source of inventory of all your hosts (could be some other indexed data, DB Connect or anything else). That would be required to give you the names of hosts (and clusters) which did not report in last 10 minutes. You can create a static list of all hosts/clusters using SPL as well. However, that would be just adding to complexity of Splunk Search and will not be that maintainable.

Do plug in your base search with the proposed solution and see if it works for you. Let me know if you need example with appendpipe/append within query for this alert instead of using lookup. You will have several such examples on Splunk Answers for alert for hosts not reporting.

Also if you find comments useful you can definitely use the Up Arrow that shows up next to the comments (on hover) to Up Vote the comment, instead of giving away your points 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

iqbalintouch
Path Finder

thank you @niketn, final query will be like this (below)??

index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N"
| stats max(_time) as _time count by host
| eval dataSource="events"
| append
[
| makemv host delim=","
| mvexpand host
| eval count=0, _time=now()]
| stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
| search eventCount=0
| sort cluster host
| fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
| table cluster host triggerTime eventCount

0 Karma

niketn
Legend

@iqbalintouch if you are trying the first approach without lookup file with static list of hosts and clusters, try the following search:

index=myindex sourcetype=xyz host="tus*" "EventLogger*" "Search event" "pcrState=N"
| stats max(_time) as _time count by host
| eval dataSource="events"
| append [|  makeresults
 |  eval data="cluster=cluster1;host=\"tus1,tus2,tus3\"|cluster=cluster2;host=\"tus4,tus5,tus6\"|cluster=cluster3;host=\"tus7,tus8,tus9\"|cluster=cluster4;host=\"tus10,tus11,tus12\"|cluster=cluster5;host=\"tus13,tus14,tus15\""
 |  makemv data delim="|"
 |  mvexpand data
 |  rename data as _raw
 |  KV
 |  makemv host delim=","
 |  mvexpand host
 |  table host cluster
 | eval count=0, _time=now(), dataSource="lookup"]
 |  stats max(_time) as triggerTime sum(count) as eventCount values(cluster) as cluster values(dataSource) as dataSource by host
 |  search eventCount=0
 |  sort cluster host
 |  fieldformat triggerTime=strftime(triggerTime,"%Y/%m/%d %H:%M:%S %p")
 |  table cluster host triggerTime eventCount
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

iqbalintouch
Path Finder

Thank you @niketnilay it was great help 🙂

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...