Hey
I've been working on a distributed Splunk environment, where in one of our indexes we have a very high cardinality "source" field (basically different for each event).
I've noticed that using tstats 'distinct_count' to count the number of sources, I am getting an incorrect result (far from one per event).
The query looks something like:
|tstats dc(source) where index=my_index
I've noticed that when I search on a smaller number of events (~100,000 instead of ~5,000,000), the result is correct.
In addition, when using estdc I get a better result than dc (which is wildly wrong).
Finally, when using stats instead of tstats, I get the correct value:
index=my_index | stats dc(source)
Any ideas? My guess is that I'm hitting some memory barrier, but there is no indication of this.
Ended up looking at the search.log and finding the following ERROR:
"SRSSerializer - max str len exceeded - probably corrupt"
After looking at the known issues page, I found SPL-166001 that stated this happens with event that are larger than 16MB. Even though this isn't the case, I tried the workaround offered there:
[search]
results_serial_format=csv
This did fix the issue, however sadly this is supposed to affect all search performance.
There is a flag you can give to tstats - chunk_size - see the docs here
https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/tstats
It talks about high cardinality distinct counts - you could experiment to see if that makes a difference
Sadly setting chunk_size doesn't make a difference.
I've since tried playing around with limits.conf on both search heads and indexers to no avail.
Also, the queries does seem to work on the indexers (when querying there directly, rather than using the search head).
Another note that might be helpful - the query works on Splunk 7.3 but not on 8.2.2.
Interesting, it sounds like you have the energy to dig a little deeper. Take a look at these links
https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html
https://conf.splunk.com/files/2020/slides/TRU1143C.pdf
which show how you can dive into debug logging and the search log - maybe that will throw up something useful.
Ended up looking at the search.log and finding the following ERROR:
"SRSSerializer - max str len exceeded - probably corrupt"
After looking at the known issues page, I found SPL-166001 that stated this happens with event that are larger than 16MB. Even though this isn't the case, I tried the workaround offered there:
[search]
results_serial_format=csv
This did fix the issue, however sadly this is supposed to affect all search performance.
Kudos for digging - glad you found a solution - could you quantify the performance hit?