Hello again.
I am testing a "light" version of an index completely compatible with the tstats + PREFIX() method (selecting only the fields I work with and removing all major breakers of field values from the _raw) as an alternative to datamodels, since it's waaaay faster.
My first test has been computing the distinct count value of a field (sessionid) with extremely high cardinality but without major breakers (so prefix compatible) both in the original index and my summary index for a given hour.
ORIGINAL INDEX: 48.692.463 distinct session ids
SUMMARY INDEX: 6.016.022 distinct session ids
However, if I do the alternative way of doing DC (count by sessionid so for each different sessionid it generates a row and then I count all the rows) it gives me the correct result.
SUMMARY INDEX with count of counts method: 48.692.463 distinct session ids
So the problem is in the DC function. It seems the issue occurs when splunk gathers the DC chunks to generate the final result, but tuning chunk_size parameter has no effect whatsoever. When I do the same test with smaller time ranges so distinct sessions >1.000.000 both original index and summary index DCs give me the same result.
How can I solve the problem? Is this a Splunk bug?