ML - best practice fillnull value to use for avg r...

nathanwray · ‎11-02-2020

Hi, I'm relatively new to Splunk. I'm building searches for mcollect to parse and store metrics into a metric sindex. My intention is to later use the metrics to train ML for alerting.

I have a set of endpoints where I have hit counts for each endpoint, and average response time for the endpoint, sliced into 5 minute intervals. At specific times of day I might have zero hits on a specific endpoint. Importantly I don't have "missing data" here, there were legitimately no hits at certain times.

I'm successfully using timechart | fillnull value=0 | untable to make sure I have a count for each endpoint for each timeslice. I understand not having gaps is important for at least some of the ML algorithms.

Where I'm uncertain is the response time values. It seems incorrect to say that the endpoint responded in 0ms during a timeslice where there were no hits, and that this could skew things since it will never be 0ms when there is any hit. I could use fillnull value=NULL for these values, which seems more "correct". However I'm unclear if I'm going to regret those null values later when I get into ML.

What is best practice for fillnull when you're backfilling performance values?

My search so far, note I need to end with _time, metric_name, _value for mcollect.

ML - best practice fillnull value to use for avg response time when count=0

stats

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

ML - best practice fillnull value to use for avg response time when count=0

stats

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers