Almost Too Eventful Assurance: Part 1

Connor_Tye · ‎06-24-2025

Modern IT and Network teams still struggle with too many alerts and isolating issues before they are notified. By pairing ThousandEye’s real-time network telemetry with Splunk ITSI’s event intelligence, teams can see past silos on the business impact of every performance problem and get to root cause faster, with less guesswork. In this blog, we’ll cover high-level examples and workflows of how this works in practice, plus ways to get started:

Turn dozens of service & network blips into a single, in-context event
Unify service monitoring across network, apps, and infrastructure
Codify your best runbooks and increase operational agility
Apply times-series forecasts and risk indices to proactively manage problems

A Morning in the Life

No matter which side of the table you sit on at your company, combining event intelligence and network insights from owned and unowned networks has the opportunity to bring impact-aware decision-making that shifts teams from reactive troubleshooting to proactive business assurance. Lets take a closer look:

5:55 AM: An on‑call engineer takes a sip of coffee and glances at their phone. Overnight, ThousandEyes detected a subtle uptick in DNS timeouts across Europe. In simpler (almost nostalgic) times, they’d scour dashboards and rush to run manual traceroutes. Instead, they take another sip of coffee, because ITSI has already grouped all the timeouts into a single incident, executed a playbook, and identified the misconfigured edge node—delivering clear context before any clients even wake up to reach for their phone.

8:15 AM: Meanwhile elsewhere, possibly near Reno Nevada, a release manager preparing for a lunchtime deployment spots their ITSI forecast widget predicting a spike in ‘API error rates’ at the exact same time one of their customer’s campaigns goes live. Instead of hoping for the best with some preemptive automation and alerting some other team to fix the issue, they confidently flow through directed troubleshooting to select a “pause-and-rollout” automated workflow, delaying non‑critical updates until after peak traffic, and then restarting tests once the surge subsides.

Cut Through the Chatter

Even the best monitoring stack can overwhelm a team with trivial alerts. That's why we’ve enabled ITOps and NetOps teams to combine ThousandEyes’ precise network tests with Event Analytics in ITSI. Now, when surfacing transient packet-loss spikes, DNS timeouts, and routing flaps, teams can bidirectionally experience them as a single, high-fidelity incident that only interrupts stakeholders for problems that will truly impact SLAs.

What’s that look like? Perhaps your CDN provider experiences a minor hiccup in U.S. East1. As you know, a few lost packets could double or even triple the time it takes to load resources, and is often the first sign of network congestion, flapping routes, or failing hardware. Instead of you receiving 10 calls throughout the night, ITSI is able to take those 10 ThousandEyes latency events, groups them into one episode, and only notify stakeholders if the average RTT remains above the SLA for longer than five minutes. This allows both ITOps and NetOps teams to react to early warning signs before throughput completely collapses – whether it's a bad fiber span, overloaded peering link, or misconfigured router.

Engage Adaptive Thresholding: Automatically raise or lower alert thresholds based on historical behavior. A brief packet-loss blip won’t fire off a page, but a sustained degradation will.
Correlation Rules: Bundle related ThousandEyes and application alerts into a single “episode.” No more juggling separate tickets for DNS timeouts and HTTP 5XX errors—they arrive as one consolidated incident.
Smart Suppression: Suppress repeat alerts within a defined window. If ThousandEyes fires ten similar HTTP ping failures in rapid succession, you receive just one incident—with all ten data points attached—so you can investigate holistically.

Often Missed Signals

All that above sounds great, right? As we know, event correlation is only as good as the data you feed it. While most organizations are overwhelmed with logs and metrics from apps and infrastructure, network data often remains underutilized – but that's changing. Splunk ITSI’s integrations include the Content Pack for Enterprise Networks, which brings rich, contextual insights from Catalyst Center device and interface health and Meraki-managed infrastructure across switches, access points, and gateway health. When combined with IT service data, it becomes possible to identify the most business-critical network issues, pinpoint root cause and quickly restore services.

See Network & Service Health, Together

With network insights and event analytics now in place to deliver earlier, more accurate alerting and anomaly detection, it's even more important to have a live, unified view of service performance and business impact. Unlock more value for your organization by integrating ThousandEyes data into ITSI’s Service Analyzer to understand network health insights in-context, and take action with assisted workflows, AI-directed troubleshooting, or automated remediation.

What does that look like to me? Your checkout service map shows a yellow color-coded health indicator. A quick click reveals that while some application errors are normal, a silent uptick in BGP route flaps (reported by ThousandEyes) is driving a small increase in packet loss - and causing checkout delays. While alarming, you don’t panic and immediately dispatch the issue to the right network team.

Enrich Your Service Map: Make sure ThousandEyes synthetic and real-user metrics (DNS, HTTP, VoIP, BGP) appear as first-class nodes alongside apps, infrastructure, and database components.
Calculate Health Scores: Define the composite health score per service by weighting network KPIs (latency, packet loss), and app KPIs (error rate, response time). A single color-coded indicator shows at-a-glance which services need attention.
Drill-Down: Click into any service map node to see live ThousandEyes test results, correlated application logs, and incident history - no context-switching required.

Now that you’ve tamed alert noise and see value unifying network, IT, and business service monitoring, you’re ready to peer into more advanced use cases that conceptualize automated workflows and predictive analytics.

Read on with Almost Too Eventful Assurance: Part 2.

Almost Too Eventful Assurance: Part 1

A Morning in the Life

Cut Through the Chatter

Often Missed Signals

See Network & Service Health, Together

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

Almost Too Eventful Assurance: Part 1

A Morning in the Life

Cut Through the Chatter

Often Missed Signals

See Network & Service Health, Together

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...