Splunk Search

Incorrect results when using PERC with TSTATS?

cramasta
Builder

Has anyone had any luck using PERC with TSTATS on a tsidx file created from data model?

here is my tstats search

| tstats PERC90("PerformanceMetricBaseSearch.duration") AS count from datamodel="PerformanceMetrics" where (nodename="PerformanceMetricBaseSearch") groupby "PerformanceMetricBaseSearch.ownerClass" "_time" span=1m | eval "ownerClass"='PerformanceMetricBaseSearch.ownerClass' | timechart span=1m perc90(count) by ownerClass limit=100

here is the equivalent regular search

index=perf PerformanceMetric | timechart span=1m PERC90(duration) by ownerClass limit=100

When i compare the timecharts in a line chart they look almost the same however Im finding the values returned from tstats are always a bit higher.

If i run the same two searches and change perc90 to avg I get the exact same result set.

im on 6.0.3

1 Solution

cramasta
Builder

Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html

After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.

The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.

This explains when I was comparing results they were just slightly off from each other

I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!

Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.

View solution in original post

cramasta
Builder

Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html

After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.

The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.

This explains when I was comparing results they were just slightly off from each other

I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!

Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.

cramasta
Builder

Something I am finding now is that when using PERC for tstats, i start to see all the perc results being populated, but as soon as the search gets half way through all the perc fields disappear. Could this be a issue when the dataset gets too large?

If i use the exactperc function the results remain. Seems like there might be a issue with the perc function working with TSTATS.

0 Karma

cramasta
Builder

This is not just limited to timechart. Stats is doing the same thing

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...