Splunk ITSI

Splunk IT Service Intelligence: Setting Up Constant Email Alerts for Degraded Services

tangtangtang12
Loves-to-Learn Lots

Currently, we receive a single email alert via Notable Event Aggregation Policies (NEAP) whenever our ITSI services transition from normal to high or critical. However, we need an automated process that sends recurring email alerts every 5 minutes if the service remains degraded and hasn't reverted back to normal.

From my research, many forums and documentation suggest achieving this through Correlation Searches. However, since we rely on KPI alerting, and none of our Correlation Searches (even the out-of-the-box ones) seem to function properly, this approach hasn't worked for us...

Given the critical nature of the services we monitor, we’re seeking guidance on setting up recurring alerts using NEAPs or any other reliable method within Splunk ITSI. Any assistance or insights on how to configure this would be greatly appreciated.

Labels (3)
0 Karma

PrewinThomas
Motivator

@tangtangtang12 

NEAPs are primarily triggered when an episode's severity changes (e.g., Normal -> Critical, or Critical -> High) or when a new notable event matches the policy.
The most robust and common way to achieve this involves using a Saved Search

You need a search that identifies services currently in a critical or high state.

Eg:

| itsi_get_service_health
| search service_health_score > 0 AND service_health_score < 60 /* Or whatever your critical/high thresholds are.
Typically: Critical < 40, High < 60 (or similar)
Adjust based on your ITSI configuration. */
| rename title AS itsi_service_name
| fields itsi_service_name, service_health_score, severity_label
| eval alert_message = "ITSI Service " + itsi_service_name + " is still " + severity_label + " (Health: " + service_health_score + ")."


Then save as a Saved Search/Alert with your desired schedule.

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @PrewinThomas 

Can you give more info on the itsi_get_service_health command you are referring to please? 

This seems like a hallucination as I cannot find any reference to it online or in any ITSI versions I have.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...