My data looks like this:
Now I have written a search, that extracts the duration of the time ("ProcessTimestamp") between "Checkpoint 1" and "Checkpoint 2". The search looks like this:
index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="Checkpoint 1" OR CheckpointName="Checkpoint 2"
| transaction DeliveryId
The search combines two events for each DeliveryId (which is the unique Identifier), and each event has a field called "duration", which is the duration between the two "_time" timestamp.
Now, I want to create a KPI within ITSI that displays the average duration between "Checkpoint 1" and "Checkpoint 2" (which would be the average value of all "duration" values for each DeliveryId).
Unfortunately, if i setup a KPI in ITSI with the search above, and select "Average" for the calculation and "duration" as the Treshold field, the KPI is always "N/A".
Any suggestions? Thanks in advance.
How are you getting data into ITSI? Have you checked the itsi_summary
index to see if theres a value tied to the KPI?
How are you getting data into ITSI? Have you checked the itsi_summary
index to see if theres a value tied to the KPI?
And I get data in by defining an "Ad hoc search" in the KPI (which is the one from my question above).
Yes, I can definitely see events for the specified KPI, they look like this:
01/22/2019 13:39:30 +0000, search_name="Indicator - d934c0bc8df580ed637cd939 - ITSI Search", search_now=1548164400.000, info_min_time=1548163470.000, info_max_time=1548164370.000, info_search_time=1548164400.858, qf="", kpi=test2, kpiid=d934c0bc8df580ed637cd939, urgency=9, serviceid="17dc8fcf-27a2-4ec4-b1d6-38365328dd4a", itsi_service_id="17dc8fcf-27a2-4ec4-b1d6-38365328dd4a", is_service_aggregate=1, is_entity_in_maintenance=0, is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0, alert_color="#FCB64E", alert_level=4, alert_value="57.111111111111114", itsi_kpi_id=d934c0bc8df580ed637cd939, is_service_max_severity_event=1, alert_severity=medium, alert_period=1, entity_title=service_aggregate, forceCsvResults="auto"
I can't really see any field that relates to the KPI value at that time. But, what I can see is, that the Threshold value I defined is working, because the alert_severity equals "medium", which is exactly what it is right now.
Looks like you're getting a value for your alert_value
field which is good. How long have you let the KPI sit after defining it? Where is it showing N/A? Is it showing a value in the service analyzer? Did you check the indexer lag after turning it on? Have you checked to see if its enabled?
I created the KPI around 2,5 hours ago. The N/A is showing in the service analyzer.
But I can see some data points when I try to edit the thresholds (in Configure -> Services -> "my service" -> KPIs -> test2). There I can see that ITSI has acknowledged the average duration values, but I cannot see them in the service analyzer.
Also, the service health score is moving up and down, based on the severity of my "test2" KPI. So the values are definitely recognized by ITSI, but I cannot see them in the service analyzer or in Deep Dives.
I don't really know what you mean by "checking the indexer lag"? My "monitoring lag" is now at 10 seconds for the defined KPI.
What is your KPI frequency and what timespan are you looking over? Can you confirm the kpiid for this service is kpiid=d934c0bc8df580ed637cd939
?
You should first try to recreate the KPI via the search, just like ITSI does.
index=itsi_summary kpiid=d934c0bc8df580ed637cd939 earliest=-2h@h latest=now
| bin _time span=1m
| stats avg(alert_value) AS alert_value by _time
If you can see a good looking chart with your data, but ITSI is not showing it, you should try to recreate the service with the KPI. If the above search works, then everything is working on the backend and ITSI is not rendering the values
The KPI search is executed every minute and looks for a timespan of 15 minutes.
Yes, the kpiid is the one mentioned.
Executing that search does not return results for the avg(alert_value). This has to do with that the field "alert_value" is not extracted (it is only shown in the raw format, but not when clicking on an event). All other fields seem to be extracted perfectly fine. Why is the field "alert_value" not extracted? Maybe because of the long decimal place?
Why are you executing it every 1 minute and looking in the last 15 minutes? Lots of redundancy. I'm suspecting the runtimes may not be able to keep up with the search. Perhaps you should create a new service and set the frequency AND lookback period to 1 minute.
alert_value
is the value that you put into the summary index. It's going to represent your duration
field. Can you confirm this is a numeric field? If its not numeric, then this explains why its showing N/A . You can find out by looking at the interesting fields on the left, and see if its has a #
or a
next to it
I think that's exactly my problem, 'alert_value' is not shown on the left side, because it is not extracted from the raw event. Therefore, I cannot find out if it's a numeric field or not.
Yep thats it then! I'm fairly confident that the way you're feeding the data to ITSI, it's not recognizing the duration field as a numeric value. You should try manually extracting it via rex
and plot it on a timechart to confirm. How exactly are you feeding this into ITSI?
Please feel free to upvote any answers which have been helpful so far 🙂
I'm feeding it to ITSI via an "Ad hoc search". What I did was:
index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="Checkpoint 1" OR CheckpointName="Checkpoint 2"
| transaction DeliveryId
I can see data when defining the thresholds, which tells me, that ITSI does recognize the values. But when writing it to the "itsi_summary" index, the field does not get extracted.
I don't really know how to fix the problem, that the alert_value is not being extracted. Do you have any suggestions?
It's due to the way you're using the transaction
command. You should pop open a search and run your query over the raw data and make sure DeliveryId
is working as expected. If there is a DeliveryId
, you should confirm if its a numeric field by looking for a #
sign next to it
I don't really get why the DeliveryId should be the problem here, because the transaction command seems to work as expected. I also tried to generate my own "duration" field, which looked like this:
index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="100_TO_01" OR CheckpointName="100_TO_02"
| transaction DeliveryId
| eval FirstTime=mvindex(ProcessTimestamp, 0)
| eval LastTime=mvindex(ProcessTimestamp, 1)
| convert timeformat="%Y-%m-%dT%H:%M:%S.%3NZ" mktime("FirstTime") as First mktime("LastTime") as Last
| eval diff=Last-First
In this case, "diff" is defenitly a numeric field, I can confirm that. If I create a new KPI with this "Ad hoc search" the result is the same. No values are shown in the service analyzer and the "alert_value" field is not extracted.
I would suggest you forego the transaction
command entirely. It doesn't scale and you are looking for trouble by running this every 1 minute and searching the last 15 minutes. A better test would be to make this as simple as possible like this
index="arvato_scm_telco_process_time_tracking_jt6_test" CheckpointName="100_TO_01" OR CheckpointName="100_TO_02" | stats count
Then use the count
field that is auto extracted and confirm that ITSI is correctly extracting it to an alert_value
field. You must use a different kpiid value to see it. I'm very confident the issue is with your query and not ITSI. Once you confirm the alert_value is there, this proves its not an ITSI issue
I tested your query with the count field and setup a new KPI. I can see events in the "itsi_summary" index, but the alert_value is still not being extracted, but it looks different now:
01/22/2019 18:42:30 +0000, search_name="Indicator - f7153072c1b928feeb214c56 - ITSI Search", search_now=1548182580.000, info_min_time=1548182250.000, info_max_time=1548182550.000, info_search_time=1548182581.714, qf="", kpi=new_test3, kpiid=f7153072c1b928feeb214c56, urgency=5, serviceid="2fd77b2c-ab52-48eb-9183-9ce7452d8432", itsi_service_id="2fd77b2c-ab52-48eb-9183-9ce7452d8432", is_service_aggregate=1, is_entity_in_maintenance=0, is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0, alert_color="#99D18B", alert_level=2, alert_value=5, itsi_kpi_id=f7153072c1b928feeb214c56, is_service_max_severity_event=1, alert_severity=normal, alert_period=1, entity_title=service_aggregate, forceCsvResults="auto"
Therefore, I still can't see any result (can only see N/A) in the service analyzer.
Go open a support case or try upgrading. Should definitely work
Thank you, I will try that out. Another thing I just realized is, that the 'alert_value' field is only extracted for the KPI 'ServiceHealthScore'. If I filter events for this KPI (which is only the value for the whole service), the 'alert_value' field is extracted.
This explains why I can see the ServiceHealthScore moving up and down in the service analyzer (the score changes based on the KPIs I defined), but the individual KPIs don't show any values.
Hi skoelpin,
just to confirm your suggestions. I tried to implement my KPI (with the search above) on ITSI version 4.0.2 and everything works fine. Previously, I implemented the KPI in version 3.1.3. Seems to be a bug in that version or in my installation of ITSI.
Thank you for your time and suggestions.
Thanks for following up with this!