Splunk ITSI

ITSI how to fix KPI sparklines?

oshirnin
Path Finder

Hello, everybody!

I have some experience with Splunk Enterprise, but I'm relatively new to ITSI. I noticed, that IT Service Intelligence App sometimes (in fact - often, if not even always) shows KPI sparklines having wrong values. For test purposes I created two KPIs based on | inputlookup - so, these can't have any null results regardless of "KPI Search Schedule" and "Calculation Window".

Here is what I have (blue rectangles on screenshot):

  1. My KPI "Cluster Nodes are UP by Node" for host2 (right on screenshot) always has value=1 for now (this is constant value in lookup), but KPI sparkline value continuously changes from 0 to 1 and back. That is wrong, looks like measured service value changes, but it is not. Sparkline must show straight line at 1 value!

  2. My KPI "Cluster Nodes are UP by Node" (left on screenshot) also always has value=1 for now, but KPI sparkline shows it continuously changes from 0 to 1 and back.

alt text

Since Splunk ITSI is nothing but a monitoring solution, this is a real problem to the end-users. These sparklines confuse and I really wonder how I can fix this. I need some help how can I review the app/itsi/homeview dashboard code and maybe even change it a little bit to fix the problem.

0 Karma

jwiedemann_splu
Splunk Employee
Splunk Employee

The issue here is actually with the rendering engine of the sparkline or something about how core splunk (not ITSI) passes the values to the sparkline for rendering. If the time range is small enough (say last 60m) then that's small enough to bucket results passed to the sparkline into 1m spans which will result in 4 out of every 5 buckets not having a datapoint. (for a 5 minute KPI anyway)

Interestingly enough, as per this answers post: https://community.splunk.com/t5/Archive/Sparkline-Format/td-p/61136 the sparkline rendering can be altered in SimpleXML and I've tested it with a dashboard. It would be a total ITSI hack, but it might well be possible to identify the conf file where the sparkline rendering config is passed for the service analyzer and modify to instead show a bar chart... the following results might look something more like this...

spark.png

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@oshirnin,

After looking at your question and comments I think you should set Fill Data Gaps with to "connect" then you will see continuous line on 1. Your current configuration looks to be on "0" value for "Fill Data Gaps with".

Hope this helps!!!

0 Karma

oshirnin
Path Finder

@VatsalJagani, hello! Sorry for the long response time, thank you for the patience. I tested carefully, unfortunately your advice does not help.

At first, my KPI was really set with 'Fill gaps in data with Null values' as you guessed:

alt text

alt text

I changed this setting to 'Fill gaps in data with last available value of data' as you suggested:

alt text

alt text

I waited for some time for KPI to be recalculated and checked the ITSI Service Analyzer, nothing changed:

alt text

My KPI always have constant value, it is built on non-changing lookup. However, KPI sparklines in Service Analyzer continuously changes from 0 to 1 and back, that is wrong.

I also mentioned, that changing Service Analyzer time interval from 'Last 1h' to longer values - 'Last 2h', 'Last 12h' helps a little bit, but even in this case there is a problem on the right end of the line:

alt text

alt text

I am quite sure, that 'Fill gaps in data with' affects only the KPI value calculation, if there is a gap in data, but does not apply to value sparkline display. It looks like there is a hard-coded timechart query with relatively low span, compared to the KPI execution interval. Therefore, this query predictably returns some Null values. After that there is a Graph Chart, which is hard-code configured with 'Null Values - Zero' visualization format, but it should be configured with 'Null Values - Connected'. I am looking a way to achieve this, @VatsalJagani can you maybe help me find where this dashboard is stored? I would like to review the XML and maybe fix it. I understand well what limitations this will lead!

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@oshirnin - Can you provide your configurations, like everything your search timerange, search time span, frequency of search execution.

0 Karma

oshirnin
Path Finder

@VatsalJagani here is what I have. All my KPIs are based on the very simple lookup

| inputlookup hostname_status.csv
hostname | status
---------------------------
host1         | down
host2         | up
host3         | up

My KPI "Cluster Nodes are UP by Node" runs every 5 min on last 5 min data (it doesn't really matter as the underlying data is the static lookup). Complete KPI configuration is the following:

[Indicator - 033d8644924e3d7c4e2724ec - ITSI Search]
action.indicator = 1
action.indicator._itsi_kpi_id = 033d8644924e3d7c4e2724ec
action.indicator._itsi_service_id = 1f78e07e-bbfd-4f31-8e33-67224008e498
alert.suppress = 0
alert.track = 0
cron_schedule = 2-59/5 * * * *
description = Auto created scheduled search during kpi creation
dispatch.earliest_time = -300s
dispatch.latest_time = now
enableSched = 1
search = | inputlookup hostname_status.csv | eval th = if(status="up", 1, 0) | fields hostname, status, th | eval alert_value = th | `gettime` | eval sec_grp = "default_itsi_security_group" | `match_entities(hostname, sec_grp)` | eval serviceid = "1f78e07e-bbfd-4f31-8e33-67224008e498" | `aggregate_entity_into_service(sum)` | `assess_severity(1f78e07e-bbfd-4f31-8e33-67224008e498, 033d8644924e3d7c4e2724ec, true, true, true)` | eval kpi="Cluster Nodes are UP by Node", urgency="5", alert_period="5", serviceid="1f78e07e-bbfd-4f31-8e33-67224008e498" | `assess_urgency`

Hope you have some ideas on the issue.

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

Please check my answer and let me know if that works.

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...