Splunk IT Service Intelligence
Highlighted

Performance problem on Machine Learning ToolKit and IT Service Intelligence

Explorer

I'm trying to predict the health score of one of my services on ITSI. For this I use the Machine Learning Toolkit. Except that the request I use is very long. I use the request from this tutorial, applied to my data: https://www.splunk.com/blog/2017/08/28/itsi-and-sophisticated-machine-learning.html

Here is the request:

index=itsi_summary 
| join kpiid 
[| inputlookup service_kpi_lookup 
| rename _key as serviceid title as service_name 
| eval kpi_info = mvzip('kpis._key', 'kpis.title', "==@@==")
| fields kpi_info service_name serviceid 
    |  mvexpand kpi_info 
| rex field=kpi_info "(?<kpiid>.+)==@@==(?<kpi_name>.+)" 
| fields - kpi_info] 
| search service_name=XXXXXXXXXXXXXXXXXXXXXXXXXX
| timechart span=5m max(alert_value) AS max_value min(alert_value) AS min_value avg(alert_value) AS avg_value median(alert_value) AS mean_value BY kpi_name 
 | eval this_date_hour = strftime(_time, "%H") 
| eval this_date_day = strftime(_time, "%w") 
| eval this_date_day = this_date_day."_" 
| eval this_date_hour = this_date_hour."_"
|reverse
|streamstats window=6 current=f first(max_value: ServiceHealthscore) as ServiceHealthScoreFromFuture
|reverse

If I want to recover 5 days of data on MLTK - Experiment, it charges for 30 minutes. If I want to recover 30 days for example, Splunk crashes after 2h-3h so I have never been able to recover data for a long time.
However, for my predictions to work well, I would need a longer time frame. I have a lot of performance problems because of this and I would like to find a solution.

0 Karma
Highlighted

Re: Performance problem on Machine Learning ToolKit and IT Service Intelligence

Splunk Employee
Splunk Employee

Hi there.
When you paste that search into a splunk search bar, with the XXXXXX part replaced with the "ServiceName", outside of the MLTK, what is the performance you are getting? How long does it take to load the data - it should be pretty fast, that isn't a lot of data to pull back!
Also, I'm guessing you may be running on a system already stressed out and |reverse twice may be causing problems. In my section of the Conf talk, https://conf.splunk.com/conf-online.html?search=there%20is%20no%20spoon#/ on slide 22, I outline how you can shift the target through time by changing _time
Assuming you want to shift 7 days , and every row is a day:

| streamstats window=7 current=f first(*) as *FromNow 
| rename ValueFromNow AS ValueFromTheFuture 
| rename *FromNow AS * 
| eval _time=strptime(_time,"%Y-%m-%d")+(24*60*60*7)

You may you are using a different span of time, looks like 5m , so make the appropriate changes. Call out to Shaun McIntyre for figuring that out.

Also in ITSI 4.0 a Predictive Analytic dashboard was added with these searches more formally structured and some extra features added - you may want to look at that work.

0 Karma
Highlighted

Re: Performance problem on Machine Learning ToolKit and IT Service Intelligence

Explorer

Hi ! Thank you for your answer !

If I run the search for 24 hours I have 9575 corresponding events. Outside MLTK, it takes 4 minutes, in MLTK it takes also 4 minutes. And if I replace the double reverse by your code, it also takes 4minutes.

About ITSI, I have version 3.1.4, I am looking for the 4.0+ to see if it helps ! If I understand what you said, there is no need of MLTK with this version ?

Thank you so much @astein_splunk

0 Karma
Highlighted

Re: Performance problem on Machine Learning ToolKit and IT Service Intelligence

Splunk Employee
Splunk Employee

The MLTK is leveraged by ITSI 4.0 with it's out of the box predictive analytical work flow. I am confused by the time problems you started this thread about - where it took 30 minutes to process the events and it crashed after 2 to 3 hours "charging". Are you still experiencing this? or is this resolved?

0 Karma
Highlighted

Re: Performance problem on Machine Learning ToolKit and IT Service Intelligence

Explorer

I still have the same problem. I also don't understand why it's taking so long. I managed to recover 20 days of data, but that's the maximum. Is there a timeout for splunk searches? The last time I tried to make a very long research Splunk stopped completely, it had to be restarted the next morning....

0 Karma