Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Incident Response: Reduce Incident Recurrence with Automated Ticket Creation

CaitlinHalla
Splunk Employee
Splunk Employee

Culture extends beyond work experience and coffee roast preferences on software engineering teams. Team culture also includes opinions on ticket creation/completion, merge request reviews, on-call rotations, and incident response. When it comes to incident response, a Don’t Repeat Incident (DRI) team or company culture can be hugely beneficial in maintaining a reliable code base, a positive user experience, and overall team happiness. 

What is DRI? When an incident occurs, on-call team members are notified of triggered alerts. These individuals are often pulled into an incident response role to troubleshoot and resolve the incident. Traditionally, once the incident is resolved, the work of the on-call team members is over. Contrast this with a DRI approach: when the incident is successfully mitigated, the work is not over. Post-incident, work needs to be tasked out and prioritized to address any code changes or safeguards that need to be put in place to prevent the same incident from occurring again in the future – every incident results in the creation of one or more actionable tickets or issues. Sidenote: post-incident reviews (or postmortems) are also a great idea and serve as a retrospective around the incident to discuss why it occurred, how it could have been prevented, who was impacted, and how it will be prevented in the future. 

When this whole process works, it helps create a proactive culture and reduces incident frequency. However, manual steps always leave room for error – a person forgets to create a ticket, a person forgets to follow up, and soon alerts for errors that have been seen before are firing yet again. Taking manual intervention out of the equation helps guarantee that tickets are created, added to the right repositories or boards, and prioritized to prevent incidents from repeating. How can technology help us with this? Cue webhooks.

Webhooks

Webhooks are a great way for systems to communicate when specific events occur. They can be used for a range of interactions – building out CI/CD pipelines, real-time data sharing like when sending confirmation emails after online order completion, initiating two-factor authentication at login, etc.  

Because of their lightweight efficiency, many platforms support webhooks – Gmail, Slack, GitHub, Jira – the list is long. In this post, we’ll look at how Splunk Observability Cloud can help with the DRI practice described above to reduce incidents, increase code resiliency, and remove the need for manual intervention with webhooks. 

Splunk Observability Cloud & Webhooks

Your environment is unique, and Splunk Observability Cloud provides many integrations out-of-the-box (Slack, Jira, ServiceNow) that may suit your incident response needs. However, in situations where your environment needs aren’t met, custom webhook integrations have your back. 

Say we keep track of our product’s code issues using GitHub issues. We can create a webhook integration in Splunk Observability Cloud to track active incidents by automatically opening an issue anytime an alert fires. Again, this helps ensure that human eyes land on the root cause and mitigation work is identified and recorded in an issue to make sure the incident does not repeat.  

Let’s look at how we can go about setting up a GitHub webhook in Splunk Observability Cloud. 

Configure a GitHub Webhook in Splunk Observability Cloud

First, we’ll navigate to Data Management in Splunk Observability Cloud and search for the Webhook integration: 

CaitlinHalla_0-1738020097367.png
 

We can follow along with the guided setup and configure our GitHub connection: 

CaitlinHalla_1-1738020132965.png
 

Select Next to customize the auto-populated payload (but first, notice how much data you can send and act on in the remote system): 

CaitlinHalla_2-1738020164278.png
 

We’ll update the fields to match those required by the GitHub API to create a new issue and use the messageTitle and description variables to populate our issue: 

CaitlinHalla_3-1738020206671.png

After selecting Next we can review and save our webhook: 

CaitlinHalla_4-1738020240318.png

With our GitHub webhook saved, we next need to add it as a notification recipient on the desired detectors. 

Note: to add a webhook as a detector recipient, you must have administrator access. 

We can configure webhooks by editing the detector of choice: 

CaitlinHalla_5-1738020369701.png

We can add Alert recipients and select Webhook:

CaitlinHalla_6-1738020437298.png
 

Then select our newly created GitHub Issue Webhook as the recipient and activate our updated alert notification: 

CaitlinHalla_7-1738020468270.png
 

Note: our GitHub webhook hits the GitHub REST API to create issues and these are scoped to the repository specified in the request URL – in our case, the Worms in Space repository. That means we would want to make sure the detector we attach this webhook to is related to that code repository specified in the URL. Thankfully, when we create detectors and alert rules, we can easily scope them to specific services. 

That’s it! When an alert rule is triggered (and in this case when an alert is cleared or resolved), we’ll see a new issue automatically appear in our repo’s GitHub issues: 

CaitlinHalla_8-1738020521869.png

This is a simple example, but you can imagine how you could integrate this with a CI/CD system, for example, to monitor a critical metric, fires off a webhook that runs a script to determine if there’s been a recent release, and rolls back the release if the metric goes out of bounds. Exact examples vary a lot because there are so many systems out there, but the possibilities are really endless.

Wrap up

Now that our alerts create issues in our GitHub repository, we can prioritize and resolve their root cause and eliminate repeat incidents. Rather than being pulled away by reoccurring alerts, our focus can stay on delivering high-priority code, our applications can stay resilient, and our users can stay happy. 

Want to accomplish something similar with Jira? Check out the Splunk Observability Cloud built-in Jira integration to easily connect Jira projects and create issues based on Splunk Observability Cloud alerts. Interested in building out a custom webhook? That works too! Once you’ve built a custom webhook, follow the same steps above so it can listen for and receive Splunk Observability Cloud alert notifications. 

Don’t yet have Splunk Observability Cloud? Try it free for 14 days!

Resources

Get Updates on the Splunk Community!

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Hey, Splunk Community! Ready to take your data management skills to the next level? Join us for a 3-part ...

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk

In today's digital financial ecosystem, security teams face an unprecedented challenge. The sheer volume of ...

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability As businesses scale ...