Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth analysis on your incoming telemetry data. SignalFlow is the computational backbone that powers all charts and detectors in Splunk Observability Cloud, but the statistical computation engine also allows you to write custom programs in the SignalFlow programming language (modeled after and similar to Python). Writing SignalFlow programs definitely isn’t required for a complete observability practice, but if you want to perform advanced computations on your Splunk Observability Cloud data, SignalFlow is your friend. In this post, we’ll explore the whys and hows of SignalFlow so you can run these computations, aggregations, and transformations on your data and stream the results to detectors and charts for custom and in-depth observability analytics.
Most Splunk Observability Cloud use cases don’t require complex computations. Out-of-the-box charts and detectors make building a complete observability solution easy. But sometimes there is a need for more detailed and advanced insight. SignalFlow can be used to:
There are many reasons why you might need to tap into SignalFlow; generally, if you need customized analytics, SignalFlow is the answer. So let’s see how it works!
You can define SignalFlow queries directly in the Splunk Observability Cloud UI or programmatically using the SignalFlow API. If you open up or create a chart in the UI, you’ll see the chart builder view:
If you select View SignalFlow, you can dive right into the SignalFlow that powers the chart and use it as a template for additional programs:
The same is true for detectors. If you open up a detector, you can select the kebab icon to Show SignalFlow:
SignalFlow programs outside of the Splunk Observability Cloud UI typically live within code configurations for detectors and dashboards (see our Observability as Code post). When you create a chart or detector using the API, you can specify a SignalFlow program as part of the request. Here’s an example of defining a detector using Terraform, where the program_text is the SignalFlow program:
You can also use the Signalflow API to run programs in the background and receive the results asynchronously in your client.
Let’s take a look at some SignalFlow functions and methods we can use to build out charts and detectors.
Most SignalFlow programs begin with a data() block. The data function is used to query data and is the main way to create stream objects, which are similar to a time-parameterized NumPy array or pandas DataFrame. Queries can run against both real-time incoming streams and historical data from the systems you monitor. In SignalFlow, you specify streams as queries that return data. Here’s a template for the data function:
We can expand on the data function in many ways. For example, here’s what it would look like to query for the CPU utilization metric and filter by host:
We can also add/chain methods or functions to our data block. Here are examples of using the mean method to look at mean CPU utilization, mean CPU utilization by Kubernetes cluster, and mean CPU utilization over the last hour:
Operations like mean, variance, percentile, exclude, ewma, timeshift, rate of change, standard deviation, map(lambda), and others are available as methods on numerical streams.
Here’s an example where our data stream, signal, is the CPU utilization with a filter of host, and we can add functions to timeshift by a week and two weeks, and then find the max value:
Comparing the max CPU utilization for two separate time series can’t actually be accomplished using the chart plot editor in the Splunk Observability Cloud UI, so this is an instance of where using SignalFlow is necessary.
To actually output these stream results to a chart, we need to call the publish() method:
We’ve now built out a chart using SignalFlow 🎉! We can also do this with detectors – read on.
Detectors evaluate conditions involving one or more streams, and typically compare streams over periods of time – i.e. disk utilization is greater than 80% for 90% of the last 10 minutes. When building detectors using SignalFlow, we still start with our data streams, and then transform our data streams using boolean logic:
Note: when setting static thresholds in the UI, thresholds can only be greater than or less than. But as you can see with SignalFlow, we can specify greater than or equal to static thresholds.
We can use these when statements on their own or combined with and, or, not statements to publish our alert conditions and build out our detectors:
The detect streams in this example are similar to data streams. Detect streams turn our boolean statement – when our signal is greater than 90 for 1 minute – into an event stream. When this statement is evaluated as true, an event will fire and be published to an event stream. This is what triggers an alert.
Note: event streams are evaluated and published in real time as metrics are ingested, enabling you to find problems faster and speed up your MTTD.
Every publish method call in a SignalFlow detect statement corresponds to a rule on the Alert Rules tab in the Splunk Observability Cloud UI. The label inside the publish block is displayed next to the number of active alerts in the Alert Rules tab:
You can create your detectors using the SignalFlow API, but if you want to use SignalFlow to build detectors directly in the Splunk Observability Cloud detector UI, you can append /#/detector/v2/new to your organization URL to do so:
While working with SignalFlow is not required, it can help customize and advance your observability practice. A great place to start is editing the SignalFlow for existing charts and detectors in the Splunk Observability Cloud UI or using observability as code with SignalFlow programs. In no time, you’ll be building out SignalFlow program background jobs and streaming customized analytics to meet all your observability and business needs.
New to Splunk Observability Cloud? Try it free for 14 days!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.