Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Almost Too Eventful Assurance: Part 1

Connor_Tye
Splunk Employee
Splunk Employee

Modern IT and Network teams still struggle with too many alerts and isolating issues before they are notified. By pairing ThousandEye’s real-time network telemetry with Splunk ITSI’s event intelligence, teams can see past silos on the business impact of every performance problem and get to root cause faster, with less guesswork. In this blog, we’ll cover high-level examples and workflows of how this works in practice, plus ways to get started: 

  • Turn dozens of service & network blips into a single, in-context event 
  • Unify service monitoring across network, apps, and infrastructure
  • Codify your best runbooks and increase operational agility 
  • Apply times-series forecasts and risk indices to proactively manage problems

A Morning in the Life 

No matter which side of the table you sit on at your company, combining event intelligence and network insights from owned and unowned networks has the opportunity to bring impact-aware decision-making that shifts teams from reactive troubleshooting to proactive business assurance. Lets take a closer look: 

5:55 AM: An on‑call engineer takes a sip of coffee and glances at their phone. Overnight, ThousandEyes detected a subtle uptick in DNS timeouts across Europe. In simpler (almost nostalgic) times, they’d scour dashboards and rush to run manual traceroutes. Instead, they take another sip of coffee, because ITSI has already grouped all the timeouts into a single incident, executed a playbook, and identified the misconfigured edge node—delivering clear context before any clients even wake up to reach for their phone. 

8:15 AM: Meanwhile elsewhere, possibly near Reno Nevada, a release manager preparing for a lunchtime deployment spots their ITSI forecast widget predicting a spike in ‘API error rates’ at the exact same time one of their customer’s campaigns goes live. Instead of hoping for the best with some preemptive automation and alerting some other team to fix the issue, they confidently flow through directed troubleshooting to select a “pause-and-rollout” automated workflow, delaying non‑critical updates until after peak traffic, and then restarting tests once the surge subsides.

Cut Through the Chatter

Even the best monitoring stack can overwhelm a team with trivial alerts. That's why we’ve enabled ITOps and NetOps teams to combine ThousandEyes’ precise network tests with Event Analytics in ITSI. Now, when surfacing transient packet-loss spikes, DNS timeouts, and routing flaps, teams can bidirectionally experience them as a single, high-fidelity incident that only interrupts stakeholders for problems that will truly impact SLAs. 

What’s that look like? Perhaps your CDN provider experiences a minor hiccup in U.S. East1. As you know, a few lost packets could double or even triple the time it takes to load resources, and is often the first sign of network congestion, flapping routes, or failing hardware. Instead of you receiving 10 calls throughout the night, ITSI is able to take those 10 ThousandEyes latency events, groups them into one episode, and only notify stakeholders if the average RTT remains above the SLA for longer than five minutes. This allows both ITOps and NetOps teams to react to early warning signs before throughput completely collapses – whether it's a bad fiber span, overloaded peering link, or misconfigured router. 

  • Engage Adaptive Thresholding: Automatically raise or lower alert thresholds based on historical behavior. A brief packet-loss blip won’t fire off a page, but a sustained degradation will.

  • Correlation Rules: Bundle related ThousandEyes and application alerts into a single “episode.” No more juggling separate tickets for DNS timeouts and HTTP 5XX errors—they arrive as one consolidated incident.

  • Smart Suppression: Suppress repeat alerts within a defined window. If ThousandEyes fires ten similar HTTP ping failures in rapid succession, you receive just one incident—with all ten data points attached—so you can investigate holistically.

Often Missed Signals 

All that above sounds great, right? As we know, event correlation is only as good as the data you feed it. While most organizations are overwhelmed with logs and metrics from apps and infrastructure, network data often remains underutilized – but that's changing. Splunk ITSI’s integrations include the Content Pack for Enterprise Networks, which brings rich, contextual insights from Catalyst Center device and interface health and Meraki-managed infrastructure across switches, access points, and gateway health. When combined with IT service data, it becomes possible to identify the most business-critical network issues, pinpoint root cause and quickly restore services.

See Network & Service Health, Together 

With network insights and event analytics now in place to deliver earlier, more accurate alerting and anomaly detection, it's even more important to have a live, unified view of service performance and business impact. Unlock more value for your organization by integrating ThousandEyes data into ITSI’s Service Analyzer to understand network health insights in-context, and take action with assisted workflows, AI-directed troubleshooting, or automated remediation. 

What does that look like to me? Your checkout service map shows a yellow color-coded health indicator. A quick click reveals that while some application errors are normal, a silent uptick in BGP route flaps (reported by ThousandEyes) is driving a small increase in packet loss - and causing checkout delays. While alarming, you don’t panic and immediately dispatch the issue to the right network team. 

  • Enrich Your Service Map: Make sure ThousandEyes synthetic and real-user metrics (DNS, HTTP, VoIP, BGP) appear as first-class nodes alongside apps, infrastructure, and database components.

  • Calculate Health Scores: Define the composite health score per service by weighting network KPIs (latency, packet loss), and app KPIs (error rate, response time). A single color-coded indicator shows at-a-glance which services need attention.

  • Drill-Down: Click into any service map node to see live ThousandEyes test results, correlated application logs, and incident history - no context-switching required.


Now that you’ve tamed alert noise and see value unifying network, IT, and business service monitoring, you’re ready to peer into more advanced use cases that conceptualize automated workflows and predictive analytics.

Read on with Almost Too Eventful Assurance: Part 2.

Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...