Has anyone had any luck using PERC with TSTATS on a tsidx file created from data model?
here is my tstats search
| tstats PERC90("PerformanceMetricBaseSearch.duration") AS count from datamodel="PerformanceMetrics" where (nodename="PerformanceMetricBaseSearch") groupby "PerformanceMetricBaseSearch.ownerClass" "_time" span=1m | eval "ownerClass"='PerformanceMetricBaseSearch.ownerClass' | timechart span=1m perc90(count) by ownerClass limit=100
here is the equivalent regular search
index=perf PerformanceMetric | timechart span=1m PERC90(duration) by ownerClass limit=100
When i compare the timecharts in a line chart they look almost the same however Im finding the values returned from tstats are always a bit higher.
If i run the same two searches and change perc90 to avg I get the exact same result set.
im on 6.0.3
Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html
After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.
The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.
This explains when I was comparing results they were just slightly off from each other
I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!
Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.
Thanks to Brian M. at Splunk for pointing me to this answers post
http://answers.splunk.com/answers/44336/percentile-implementation.html
After reading the answers link it seems like my raw data search, which is using perc99, is trying to process a percentile for more than 1000 distinct values, which will then use an algorithm to approximate the final percentile values.
The accelerated data model that was created for the same dataset, when ran over the same time frame, will not approximate the final percentile values and give me an exact percentile, even though I am still using perc99.
This explains when I was comparing results they were just slightly off from each other
I was able to confirm this by using exactperc99 in my raw search, and then used perc99 in my data model search. The result set came out to be identical!
Being new to data models I wanted to verify 100% I was getting the same results as the raw search, so it was a bit concerning to see the differences in the results. However I can understand there are HUGE differences between data models and raw data, and the limitations for doing calculations between both can be very different. It would be helpful to know at what point data models will use this approximation approach to percentiles.
Something I am finding now is that when using PERC for tstats, i start to see all the perc results being populated, but as soon as the search gets half way through all the perc fields disappear. Could this be a issue when the dataset gets too large?
If i use the exactperc function the results remain. Seems like there might be a issue with the perc function working with TSTATS.
This is not just limited to timechart. Stats is doing the same thing