Splunk Search

Whats the difference between dc (distinct count) and estdc (estimated distinct count)

khourihan_splun
Splunk Employee
Splunk Employee

I have a search that returns unique visitors query over 30 days' worth of logs :

Using dc() it was a lot slower. Here is the comparison:

estdc: 3300 seconds, 15351270
dc: 17700 seconds, 15134261

ESTDC looks good enough, especially given that it's fairly accurate (1.5% difference) and MUCH faster. Any information will be appreciated.

Tags (2)
1 Solution

khourihan_splun
Splunk Employee
Splunk Employee

Basically, the technique is based on hashing and hash collisions. You can estimate how many distinct items you have tried to hash based on the number of hash collisions and the size of the hash bucket.

More or less it will use constant time and resources regardless of the number of unique values. The technique is accurate to about 1-2%, although it may be over or undercounting.

View solution in original post

khourihan_splun
Splunk Employee
Splunk Employee

Basically, the technique is based on hashing and hash collisions. You can estimate how many distinct items you have tried to hash based on the number of hash collisions and the size of the hash bucket.

More or less it will use constant time and resources regardless of the number of unique values. The technique is accurate to about 1-2%, although it may be over or undercounting.

VatsalJagani
Champion

@khourihan_splunk - Could you please elaborate on how does it use constant time and resource regardless of the number of values? As per my understanding if I search for estdc(bytes) it needs to calculate the hash for each value of bytes and then it must go through all the hashes and count number of the collision.

0 Karma
Get Updates on the Splunk Community!

How I Instrumented a Rust Application Without Knowing Rust

As a technical writer, I often have to edit or create code snippets for Splunk's distributions of ...

Splunk Community Platform Survey

Hey Splunk Community, Starting today, the community platform may prompt you to participate in a survey. The ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...