Splunk IT Service Intelligence

In Splunk IT Service Intelligence, how come KPI is not showing any data?

florianduhme
Path Finder

My data looks like this:

alt text

Now I have written a search, that extracts the duration of the time ("ProcessTimestamp") between "Checkpoint 1" and "Checkpoint 2". The search looks like this:

index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="Checkpoint 1" OR CheckpointName="Checkpoint 2"
| transaction DeliveryId

The search combines two events for each DeliveryId (which is the unique Identifier), and each event has a field called "duration", which is the duration between the two "_time" timestamp.

Now, I want to create a KPI within ITSI that displays the average duration between "Checkpoint 1" and "Checkpoint 2" (which would be the average value of all "duration" values for each DeliveryId).

Unfortunately, if i setup a KPI in ITSI with the search above, and select "Average" for the calculation and "duration" as the Treshold field, the KPI is always "N/A".

Any suggestions? Thanks in advance.

1 Solution

skoelpin
SplunkTrust
SplunkTrust

How are you getting data into ITSI? Have you checked the itsi_summary index to see if theres a value tied to the KPI?

View solution in original post

skoelpin
SplunkTrust
SplunkTrust

How are you getting data into ITSI? Have you checked the itsi_summary index to see if theres a value tied to the KPI?

View solution in original post

florianduhme
Path Finder

And I get data in by defining an "Ad hoc search" in the KPI (which is the one from my question above).

0 Karma

florianduhme
Path Finder

Yes, I can definitely see events for the specified KPI, they look like this:

01/22/2019 13:39:30 +0000, search_name="Indicator - d934c0bc8df580ed637cd939 - ITSI Search", search_now=1548164400.000, info_min_time=1548163470.000, info_max_time=1548164370.000, info_search_time=1548164400.858, qf="", kpi=test2, kpiid=d934c0bc8df580ed637cd939, urgency=9, serviceid="17dc8fcf-27a2-4ec4-b1d6-38365328dd4a", itsi_service_id="17dc8fcf-27a2-4ec4-b1d6-38365328dd4a", is_service_aggregate=1, is_entity_in_maintenance=0, is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0, alert_color="#FCB64E", alert_level=4, alert_value="57.111111111111114", itsi_kpi_id=d934c0bc8df580ed637cd939, is_service_max_severity_event=1, alert_severity=medium, alert_period=1, entity_title=service_aggregate, forceCsvResults="auto"

I can't really see any field that relates to the KPI value at that time. But, what I can see is, that the Threshold value I defined is working, because the alert_severity equals "medium", which is exactly what it is right now.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Looks like you're getting a value for your alert_value field which is good. How long have you let the KPI sit after defining it? Where is it showing N/A? Is it showing a value in the service analyzer? Did you check the indexer lag after turning it on? Have you checked to see if its enabled?

0 Karma

florianduhme
Path Finder

I created the KPI around 2,5 hours ago. The N/A is showing in the service analyzer.
But I can see some data points when I try to edit the thresholds (in Configure -> Services -> "my service" -> KPIs -> test2). There I can see that ITSI has acknowledged the average duration values, but I cannot see them in the service analyzer.

Also, the service health score is moving up and down, based on the severity of my "test2" KPI. So the values are definitely recognized by ITSI, but I cannot see them in the service analyzer or in Deep Dives.

I don't really know what you mean by "checking the indexer lag"? My "monitoring lag" is now at 10 seconds for the defined KPI.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

What is your KPI frequency and what timespan are you looking over? Can you confirm the kpiid for this service is kpiid=d934c0bc8df580ed637cd939?

You should first try to recreate the KPI via the search, just like ITSI does.

index=itsi_summary kpiid=d934c0bc8df580ed637cd939 earliest=-2h@h latest=now
| bin _time span=1m 
| stats avg(alert_value) AS alert_value by _time 

If you can see a good looking chart with your data, but ITSI is not showing it, you should try to recreate the service with the KPI. If the above search works, then everything is working on the backend and ITSI is not rendering the values

0 Karma

florianduhme
Path Finder

The KPI search is executed every minute and looks for a timespan of 15 minutes.
Yes, the kpiid is the one mentioned.

Executing that search does not return results for the avg(alert_value). This has to do with that the field "alert_value" is not extracted (it is only shown in the raw format, but not when clicking on an event). All other fields seem to be extracted perfectly fine. Why is the field "alert_value" not extracted? Maybe because of the long decimal place?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Why are you executing it every 1 minute and looking in the last 15 minutes? Lots of redundancy. I'm suspecting the runtimes may not be able to keep up with the search. Perhaps you should create a new service and set the frequency AND lookback period to 1 minute.

alert_value is the value that you put into the summary index. It's going to represent your duration field. Can you confirm this is a numeric field? If its not numeric, then this explains why its showing N/A . You can find out by looking at the interesting fields on the left, and see if its has a # or a next to it

0 Karma

florianduhme
Path Finder

I think that's exactly my problem, 'alert_value' is not shown on the left side, because it is not extracted from the raw event. Therefore, I cannot find out if it's a numeric field or not.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Yep thats it then! I'm fairly confident that the way you're feeding the data to ITSI, it's not recognizing the duration field as a numeric value. You should try manually extracting it via rex and plot it on a timechart to confirm. How exactly are you feeding this into ITSI?

Please feel free to upvote any answers which have been helpful so far 🙂

0 Karma

florianduhme
Path Finder

I'm feeding it to ITSI via an "Ad hoc search". What I did was:

  1. Create a new Service
  2. Create a new KPI with the following settings:
  • KPI source = Ad hoc search
  • search = index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="Checkpoint 1" OR CheckpointName="Checkpoint 2" | transaction DeliveryId
  • Threshold field = duration
  • split by entity = no
  • kpi search schedule = 1 minute
  • Service/Aggregate Calculation = Average
  • calculation window = Last 15 minutes
  • unit = secs
  • monitoring lag = 10 seconds
  • enable backfill = no

I can see data when defining the thresholds, which tells me, that ITSI does recognize the values. But when writing it to the "itsi_summary" index, the field does not get extracted.
I don't really know how to fix the problem, that the alert_value is not being extracted. Do you have any suggestions?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

It's due to the way you're using the transaction command. You should pop open a search and run your query over the raw data and make sure DeliveryId is working as expected. If there is a DeliveryId, you should confirm if its a numeric field by looking for a # sign next to it

0 Karma

florianduhme
Path Finder

I don't really get why the DeliveryId should be the problem here, because the transaction command seems to work as expected. I also tried to generate my own "duration" field, which looked like this:

index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="100_TO_01" OR CheckpointName="100_TO_02"
| transaction DeliveryId
| eval FirstTime=mvindex(ProcessTimestamp, 0)
| eval LastTime=mvindex(ProcessTimestamp, 1)
| convert timeformat="%Y-%m-%dT%H:%M:%S.%3NZ" mktime("FirstTime") as First mktime("LastTime") as Last
| eval diff=Last-First

In this case, "diff" is defenitly a numeric field, I can confirm that. If I create a new KPI with this "Ad hoc search" the result is the same. No values are shown in the service analyzer and the "alert_value" field is not extracted.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I would suggest you forego the transaction command entirely. It doesn't scale and you are looking for trouble by running this every 1 minute and searching the last 15 minutes. A better test would be to make this as simple as possible like this

index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="100_TO_01" OR CheckpointName="100_TO_02" | stats count

Then use the count field that is auto extracted and confirm that ITSI is correctly extracting it to an alert_value field. You must use a different kpiid value to see it. I'm very confident the issue is with your query and not ITSI. Once you confirm the alert_value is there, this proves its not an ITSI issue

0 Karma

florianduhme
Path Finder

I tested your query with the count field and setup a new KPI. I can see events in the "itsi_summary" index, but the alert_value is still not being extracted, but it looks different now:

01/22/2019 18:42:30 +0000, search_name="Indicator - f7153072c1b928feeb214c56 - ITSI Search", search_now=1548182580.000, info_min_time=1548182250.000, info_max_time=1548182550.000, info_search_time=1548182581.714, qf="", kpi=new_test3, kpiid=f7153072c1b928feeb214c56, urgency=5, serviceid="2fd77b2c-ab52-48eb-9183-9ce7452d8432", itsi_service_id="2fd77b2c-ab52-48eb-9183-9ce7452d8432", is_service_aggregate=1, is_entity_in_maintenance=0, is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0, alert_color="#99D18B", alert_level=2, alert_value=5, itsi_kpi_id=f7153072c1b928feeb214c56, is_service_max_severity_event=1, alert_severity=normal, alert_period=1, entity_title=service_aggregate, forceCsvResults="auto"

Therefore, I still can't see any result (can only see N/A) in the service analyzer.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Go open a support case or try upgrading. Should definitely work

0 Karma

florianduhme
Path Finder

Thank you, I will try that out. Another thing I just realized is, that the 'alert_value' field is only extracted for the KPI 'ServiceHealthScore'. If I filter events for this KPI (which is only the value for the whole service), the 'alert_value' field is extracted.

This explains why I can see the ServiceHealthScore moving up and down in the service analyzer (the score changes based on the KPIs I defined), but the individual KPIs don't show any values.

0 Karma

florianduhme
Path Finder

Hi skoelpin,
just to confirm your suggestions. I tried to implement my KPI (with the search above) on ITSI version 4.0.2 and everything works fine. Previously, I implemented the KPI in version 3.1.3. Seems to be a bug in that version or in my installation of ITSI.
Thank you for your time and suggestions.

skoelpin
SplunkTrust
SplunkTrust

Thanks for following up with this!

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!