Hi, I'm new to ML in Splunk. As a POC I'm trying to forecast expected call volumes for a service, and then alert if we are under or over the expected volume. I'm training the model on 30 minute chunks of historic data, which goes back about 7 months. Call volumes are periodic based on both the time of day and day of week, so I'd thought I would use a period of 336 (the number of half hours in a week): | mstats sum(_value) as call_count WHERE metric_name="myServiceCalls" span=30m@w index=my_metrics | makecontinuous _time span=30m@h | fillnull value=0 call_count
| fit StateSpaceForecast "call_count" output_metadata=true holdback=1week forecast_k=2week conf_interval=50 period=336 into "service_call_count" I am trying to experiment with using "apply" on the previous 1/2h hours of live data. Maybe "apply" is the wrong tool here. index=myliveIndex earliest="-30m@h" latest="@h" host="p*" sourcetype="p*" "my service string"
| bin _time span="30m" aligntime="@h"
| stats count(_raw) AS call_count BY _time | apply "service_call_count" The error I'm getting is (I believe) that I am not supplying 336 data points for the apply function: Error in 'apply' command: holdback value equates to too many events being withheld (336 >= 2). I now understand that apply expects to see an entire "period" of data, so I'm guessing this is the wrong approach for my usecase. Can anyone point me in the right direction? Really, I want to lookup the predicted range of counts for a given 1/2 hour and then alert when we're out of range.
... View more
Hi, I'm relatively new to Splunk. I'm building searches for mcollect to parse and store metrics into a metric sindex. My intention is to later use the metrics to train ML for alerting. I have a set of endpoints where I have hit counts for each endpoint, and average response time for the endpoint, sliced into 5 minute intervals. At specific times of day I might have zero hits on a specific endpoint. Importantly I don't have "missing data" here, there were legitimately no hits at certain times. I'm successfully using timechart | fillnull value=0 | untable to make sure I have a count for each endpoint for each timeslice. I understand not having gaps is important for at least some of the ML algorithms. Where I'm uncertain is the response time values. It seems incorrect to say that the endpoint responded in 0ms during a timeslice where there were no hits, and that this could skew things since it will never be 0ms when there is any hit. I could use fillnull value=NULL for these values, which seems more "correct". However I'm unclear if I'm going to regret those null values later when I get into ML. What is best practice for fillnull when you're backfilling performance values? My search so far, note I need to end with _time, metric_name, _value for mcollect. index=my_index earliest="-1d@d" latest="@d" host="prod*" "MYSTRING|*" | eval all=split(_raw,"|") | eval Application=mvindex(all,2) | eval Service=mvindex(all,4) | eval Actual=mvindex(all,8) | eval metric_name=Application.".".Service.".actual.avg" | bin _time span=5m | stats avg(Actual) AS _value BY _time metric_name | eval _value=round(_value) | timechart limit=0 span=5m min(_value) AS _value by metric_name | fillnull value=NULL | untable _time metric_name _value | mcollect index=my_index_metrics
... View more