I have a need for approximate statistics/metrics and am currently using Event Sampling, which drastically speeds up the queries. For queries that calculate averages this works great, but I also have a need to do counts. If you set the Event Sampling to for example 1:100, then Splunk seems to look at every 100 Events, which is also reflected in how many Events that are matched when doing 1:100 vs 1:1.
Example count with and without sampling:
1:100 = 26311
1:1 = 2623658
1:100 scaled up to 1:1 = 2631100
Diff = 7442, which is 0.3%
The Time Period was a previous hour (not the latest hour) as not to have incoming Events affect the Count.
0.3% difference is perfectly ok for my purpose.
Am I thinking of this correctly, or is there any risk of much bigger differences in Count (after upscaling the count)?
I think you're correct, at least with regard to event counts.