Solved: Incorrect results when using PERC with TSTATS?

cramasta · ‎10-16-2014

Has anyone had any luck using PERC with TSTATS on a tsidx file created from data model?

here is my tstats search

| tstats PERC90("PerformanceMetricBaseSearch.duration") AS count from datamodel="PerformanceMetrics" where (nodename="PerformanceMetricBaseSearch") groupby "PerformanceMetricBaseSearch.ownerClass" "_time" span=1m | eval "ownerClass"='PerformanceMetricBaseSearch.ownerClass' | timechart span=1m perc90(count) by ownerClass limit=100

here is the equivalent regular search

index=perf PerformanceMetric | timechart span=1m PERC90(duration) by ownerClass limit=100

When i compare the timecharts in a line chart they look almost the same however Im finding the values returned from tstats are always a bit higher.

If i run the same two searches and change perc90 to avg I get the exact same result set.

im on 6.0.3

cramasta · ‎10-16-2014

Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html

After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.

The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.

This explains when I was comparing results they were just slightly off from each other

I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!

Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.

View solution in original post

cramasta · ‎10-16-2014

Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html

After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.

The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.

This explains when I was comparing results they were just slightly off from each other

I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!

Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.

cramasta · ‎10-17-2014

Something I am finding now is that when using PERC for tstats, i start to see all the perc results being populated, but as soon as the search gets half way through all the perc fields disappear. Could this be a issue when the dataset gets too large?

If i use the exactperc function the results remain. Seems like there might be a issue with the perc function working with TSTATS.

cramasta · ‎10-16-2014

This is not just limited to timechart. Stats is doing the same thing

Incorrect results when using PERC with TSTATS?

The Payment Operations Wake-Up Call: Why Financial Institutions Can't Afford ...

Make Your Case: A Ready-to-Send Letter for Getting Approval to Attend .conf25

Community Spotlight: A Splunk Expert's Journey

Are you a member of the Splunk Community?

Incorrect results when using PERC with TSTATS?

The Payment Operations Wake-Up Call: Why Financial Institutions Can't Afford ...

Make Your Case: A Ready-to-Send Letter for Getting Approval to Attend .conf25

Community Spotlight: A Splunk Expert's Journey