I have created a splunk query which prints the error code by time. I want to forecast the error code but it seems to use MLTK time forcast series i have use timechart command and with timechart command i have to use aggregate function on error code field.
Also on MLTK time series forecast it wont forecast the error code it will just show the predicted value which wont be as same as error codes we have.
Can someone help on this. How can i achieve my goal.
Thanks a lot !!
I'm confused here.. You want to forecast something without any time series data? Do you understand the difference between forecasting and predicting?
Forecasting is using time to "predict" a value relative to past time. Predicting doesn't care about time (in the past, present, or future), it simply takes inputs and predicts an output. So you will need to PREDICT error codes in the MLTK
You're still missing the concept.. You are NOT forecasting the error code relative to time, you are predicting the error code at a point in time. When the search runs and it tries to predict, it is not taking _time into account, it's using other variables. So if this runs at a certain frequency, you're plotting the prediction over time. I'd highly recommend you brush up on these topics before moving forward
Below is the scenario.
host=* sourcetype=abc* summary NOT ("cda" OR "fgh" )
| eval error= if(Error_Code=0, 0, 1)
| timechart values(error) as error
Above is the code where 0 represents "Success" and 1 represents "Error". I am using MLTK forecast time series assistant on that. I want predicted value to be only 0 and 1 only. But when i use forecast time series assistant the predicted values is between 0 and 1. like 0.23, 0.34, 056.
If i use predict numeric fields or predict categorical fields assistants of MLTK then those dont shows future predicted values.
It seems what i want to achieve is not possible this way. So can you suggest any other way to predict future value of error field with respect to time.
Thanks a lot 🙂
Yeah, that's giving you a 0-100% likely hood of an error or not. So you would have to post process the results with an eval
| eval perdicted_error=if(predicted<0.5,0,1) | timechart max(predicted_error)
As for future predicted values, that takes some elbow grease to timeshift future values. I spoke about this at CONF last year and used a similar technique for timeshifting this year. What you need to do is create future empty buckets with the SPL below, append them onto your search with the
apply command and fill those future empty buckets with the applied values.
| makeresults count=100000 | streamstats count as count | eval earliest_time=now() | eval time=case(count=100000,relative_time(earliest_time,"+100000d"),count=1,earliest_time) | makecontinuous time span=1d | eval timeAsANumber=time | eval _time=time | eval time_human=strftime(time, "%Y-%m-%d %H:%M:%S") | fields + time | append [| search index=blah host= sourcetype=abc summary NOT ("cda" OR "fgh" ) | eval perdicted_error=if(predicted<0.5,0,1) | timechart max(predicted_error) by _time | apply <model_name>]
We may have more than one events in one seconds span or may not have any event in one second span. But timechart command groups the events by time. even if i use span=1s then it will show the one value.
Is there any way i can get each event on timechart .
Yeah, you can use
values(field) to do the trick.. But IMO, it's much better to have an aggregated number there. Do you really need 1 second granularity?
(If this has been helpful to you so far, please upvote/accept, or clarify what you're looking for)