I have to admit, there are certain aspects of ITSI I find limiting.
For example: I need to create KPI which the sum of two fields in the last 5 minutes, say field foo and bar.
I can specify I want the sum of either of those individually as a metic but I cannot specify that I want the sum of both of them combined into one KPI.
Is there a way to do this that anyone is aware of?
What's the purpose here?
Are you wanting to create a KPI with the summed values then display it on a glasstabe/deep dive view?
The purpose is to create a KPI of the sum of the two other KPIs and then display it in deep dive or possibly a glass table, yes.
I think I kind of found a work around but I guess backfill doesnt work for it.
sourcetype=sourcet1 xyz | eventstats sum(calls_kpi1) as calls1 sum(calls_kpi2) as calls2 | eval calls_proc = calls1 + calls2
@skoelpin
I've also found that when I use the query above to create a KPI base search, the preview for the Thresholds is all out of whack and I cannot set it properly. The value range is showing in Deep Dive as between 11 - 44 and yet the values shown in the preview for Thresholds are in the millions, which makes no sense.
This is pretty discouraging as I am being asked to convert a report into an ITSI dashboard and it requires more complex queries than just summing one KPI or taking the average or another single KPI. Was ITSI just not designed for anything other than basic statistical analysis on single KPIs?
The preview will show a sample rather than 1:1 results. Did you use a transformational command in your search? If so then this could be the culprit why it's not showing up correctly in the preview.
How frequently are you running your KPI searches? You should recreate your desired output using the itsi_summary
and see what the results are so you can verify its getting written to ITSI correctly. Once you confirm this, you can then start narrowing down the problem. As for complex analysis, I have some pretty complicated queries and ITSI is handling it good
I will admit I struggle to understand how itsi_summary helps me to recreate the query to verify the results.
index=itsi_summary kpi="SCP CPU Utilization"
That returns an event such as:
10/24/2018 19:32:57 +0000, search_name="Indicator - Shared - 5bcf77972b9d44157e79157f - ITSI Search", search_now=1540409580.000, info_min_time=1540409277.000, info_max_time=1540409577.000, info_search_time=1540409582.235, qf="", kpi="SCP CPU Utilization", kpiid=80e03fe65ca6fb18fccd8fc4, urgency=5, serviceid="1e9057dc-4f5d-4abf-a773-e85349dd8a84", itsi_service_id="1e9057dc-4f5d-4abf-a773-e85349dd8a84", is_service_aggregate=1, is_entity_in_maintenance=0, is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0, kpibasesearch=5bcf77972b9d44157e79157f, alert_color="#99D18B", alert_level=2, alert_value=11, itsi_kpi_id=80e03fe65ca6fb18fccd8fc4, is_service_max_severity_event=1, alert_severity=normal, alert_period=1, entity_title=service_aggregate
After digging through that I am confused on what I can extract from it to help me understand why Im not able to create thresholds for this query. Which of these fields is the value for that kpi at that time?
Yes, you should use itsi_summary
and rebuild what you're trying to accomplish in ITSI. If you're getting the same values as you would over the raw data then it's working as expected. It should look like this
index=itsi_summary kpi=<YOUR KPI VALUE>
| timechart span=5m avg(KPI VALUE)
This assumes you set a 5 minute span for KPI's to report. This will then build a timechart of what you should see in ITSI. You can also run the search over the raw values (your adhoc/base search) and compare the output to your itsi_summary output
Thank you for this query. Yes, when I put that in for itsi_summary, I see values ranging from 11-13 as expected and as I see in Deep Dive. This is not reflected in the preview for ITSI Thresholds. But I think I can set a static threshold since they the range is so small and see if it will at least work that way.
I am going to need Splunk professional services to troubleshoot with me why I cannot get a backfill from eventstats or eval queries I think.
I used to do professional services and the ones experienced with ITSI are few and far between, it could take you months to get one on site. You could file a support case and they can help you out too.
Tomorrow I will take the same formatted query and backfill it on my system and let you know how it goes
UPDATE: I took your format and successfully backfilled the itsi_summary index for 7 days. One thing to note is you should use a separate function for each eventstats
line. It should look like this below. If you look at the SPL prior to putting it into ITSI, you will see avg_scp_cpu
isn't created because you're missing fields. Your format is only pulling the last function scp4
and not creating the new field because its missing fields. If you follow the format below, it works because there is an avg_scp_cpu
field which makes the backfill possible.
sourcetype=cpu SCP1_CPU
| eventstats sum(SCP1_CPU) as sum_scp1
| eventstats sum(SCP2_CPU) as sum_scp2
| eventstats sum(SCP3_CPU) as sum_scp3
| eventstats sum(SCP4_CPU) as sum_scp4
| eval avg_scp_cpu = (sum_scp1 + sum_scp2 + sum_scp3 + sum_scp4) / 4
Thats very odd because when I run my query, I see the avg_scp_cpu field in the Interesting Fields column when I run my query:
sourcetype=cpu SCP1_CPU | eventstats sum(SCP1_CPU) as sum_scp1 sum(SCP2_CPU) as sum_scp2 sum(SCP3_CPU) as sum_scp3 sum(SCP4_CPU) as sum_scp4 | eval avg_scp_cpu = (sum_scp1 + sum_scp2 + sum_scp3 + sum_scp4) / 4
in a regular search so it is creating the avg_scp_cpu field for me. I also see in there fields for sum_scp1, sum_scp2, sum_scp3, and sum_scp4 in the Interesting fields so it getting those as well... Im not sure what you are talking about.
I will try your new version with 4 eventstats to see if the backfill works for me.
Did you see threshold previews for your new query as well?
Well I copied your query into an Ad-hoc KPI and did backfill of 7 hours and it filled it in with data that was all in the 3000s whereas the range for the data is 11-13. So it must be calculating it wrong somehow.
What did you use your your Calculation when creating the KPI? I used Maximum since its already doing the calculation in the query itself...
I used Average.. You should recreate this in the search prior to backfilling in ITSI
Once you get the desired results, you should then consume into ITSI. Best way to do it is to append a timechart span=1m max(KPI value)
to the end of your search.
Yes, I did. I ran:
sourcetype=cpu SCP1_CPU
| eventstats sum(SCP1_CPU) as sum_scp1
| eventstats sum(SCP2_CPU) as sum_scp2
| eventstats sum(SCP3_CPU) as sum_scp3
| eventstats sum(SCP4_CPU) as sum_scp4
| eval avg_scp_cpu = (sum_scp1 + sum_scp2 + sum_scp3 + sum_scp4) / 4
| timechart span=1m max(avg_scp_cpu)
in search first and the values are all 11, what I am hoping for. Running the same query in Ad-hoc KPI for ITSI and backfilling 7 days gives me values in the 3000s with the Average calculation even. I am running it every minute and calculating on the last 5 mins. Could that be the issue?
The span in your timechart needs to match to what you're feeding into ITSI. You should open the kpi calculation and expand out the macro (ctrl + shift + e) and grab the calculation and run it against the itsi_summary to see where the breakdown is.
sourcetype=cpu SCP1_CPU
| eventstats sum(SCP1_CPU) as sum_scp1
| eventstats sum(SCP2_CPU) as sum_scp2
| eventstats sum(SCP3_CPU) as sum_scp3
| eventstats sum(SCP4_CPU) as sum_scp4
| eval avg_scp_cpu = (sum_scp1 + sum_scp2 + sum_scp3 + sum_scp4) / 4
| timechart span=5m max(avg_scp_cpu)
Okay, so changing it to 5m span, I still see 11 as the number of cpu util value.
Im not sure how to do this "open the kpi calculation" you mention. I see the macro you listed (is that for mac as well) but Im unsure where in the process I am suppose to press that but it sounds like a good path of troubleshooting.
Can you detail a bit more how to open the kpi calculation and expand out the macro? I tried doing so in the services configuration screen but its just a dropdown box for the kpi calculation and doesnt do anything when I press ctl+shift+e
Go to the nav bar at the top and click Services
then click your Service. Open up a KPI "Search and Calculate" and hit Edit
then click Generated Search
at the bottom which will open a new search with the ITSI macros appended. Hit "ctrl + shift + e" to expand the macro and strip off everything after the stats
. This is how ITSI is seeing your data
You can then take it one step further and change stats
to timechart
and add the span=5m. This will show you exactly how ITSI is calculating and showing the values
Darn! I thought I had it. I included the timeframe for calculation in the actual query in order to get backfill to work properly but it didn't work. In fact, it didn't backfill at all.
I created it with this:
sourcetype=cpu SCP1_CPU earliest=-5m latest=now
| eventstats sum(SCP1_CPU) as sum_scp1
| eventstats sum(SCP2_CPU) as sum_scp2
| eventstats sum(SCP3_CPU) as sum_scp3
| eventstats sum(SCP4_CPU) as sum_scp4
| eval avg_scp_cpu = (sum_scp1 + sum_scp2 + sum_scp3 + sum_scp4) / 4
Not being able to get backfill is a limitation I can work around for now. I still really wish I could see the threshold preview but at least I can still set thresholds and then see them propogate in the deep dive. I can't imagine why it would be returning 3000+ values when its always expected to be 11-13 within a 5 min calculation window.
You can backfill! Get rid of that time modifier on that top line
Go check your summary index to see how far it backfilled. index=itsi_summary kpi="YOUR KPI NAME"
Perhaps its still backfilling. Did you get the message saying backfill was complete? Also, the alert_value
field is going to be the value, so check that out, it should be in the range of 11-13
Yes, Im not sure if you saw where I wrote this before. I ran it without the time modifier. It did indeed backfill but it filled with values in the 3000s range so its incorrect.
I went into itsi_summary and confirmed. In the last 5 mins, it shows it at 11-13 range. When I searched in the backfilled section of yesterday, it showed in the 3000s range.
So there is an issue with the calculations of the backfill data.