Solved: tstats distinct_count returns an incorrect value o...

yotamros · ‎11-28-2023

Hey

I've been working on a distributed Splunk environment, where in one of our indexes we have a very high cardinality "source" field (basically different for each event).

I've noticed that using tstats 'distinct_count' to count the number of sources, I am getting an incorrect result (far from one per event).

The query looks something like:

|tstats dc(source) where index=my_index

I've noticed that when I search on a smaller number of events (~100,000 instead of ~5,000,000), the result is correct.

In addition, when using estdc I get a better result than dc (which is wildly wrong).

Finally, when using stats instead of tstats, I get the correct value:

index=my_index | stats dc(source)

Any ideas? My guess is that I'm hitting some memory barrier, but there is no indication of this.

yotamros · ‎12-01-2023

Ended up looking at the search.log and finding the following ERROR:

"SRSSerializer - max str len exceeded - probably corrupt"

After looking at the known issues page, I found SPL-166001 that stated this happens with event that are larger than 16MB. Even though this isn't the case, I tried the workaround offered there:

[search]

results_serial_format=csv

This did fix the issue, however sadly this is supposed to affect all search performance.

View solution in original post

bowesmana · ‎11-28-2023

There is a flag you can give to tstats - chunk_size - see the docs here

https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/tstats

It talks about high cardinality distinct counts - you could experiment to see if that makes a difference

yotamros · ‎11-29-2023

Sadly setting chunk_size doesn't make a difference.

I've since tried playing around with limits.conf on both search heads and indexers to no avail.

Also, the queries does seem to work on the indexers (when querying there directly, rather than using the search head).

Another note that might be helpful - the query works on Splunk 7.3 but not on 8.2.2.

bowesmana · ‎11-29-2023

Interesting, it sounds like you have the energy to dig a little deeper. Take a look at these links

https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html

https://conf.splunk.com/files/2020/slides/TRU1143C.pdf

which show how you can dive into debug logging and the search log - maybe that will throw up something useful.

yotamros · ‎12-01-2023

Ended up looking at the search.log and finding the following ERROR:

"SRSSerializer - max str len exceeded - probably corrupt"

After looking at the known issues page, I found SPL-166001 that stated this happens with event that are larger than 16MB. Even though this isn't the case, I tried the workaround offered there:

[search]

results_serial_format=csv

This did fix the issue, however sadly this is supposed to affect all search performance.

bowesmana · ‎12-03-2023

Kudos for digging - glad you found a solution - could you quantify the performance hit?

tstats distinct_count returns an incorrect value on high cardinality source field

count

tstats

Fall Into Learning with New Splunk Education Courses

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Are you a member of the Splunk Community?