topic Re: Counting Unique Events Over 30 Days in Reporting

Counting Unique Events Over 30 Days

dodge27 — Tue, 13 Jul 2021 20:29:28 GMT

Hello,

How can I improve on my Splunk query so that only one event is counted over a 30-day span where we have 500,000,000 events matched?

This is the query I have so far:

| tstats count WHERE (index=<my_index> sourcetype=json_data earliest=-30d latest=-0h) BY _time span=1mon, host, address, server

This query returns approximately 600,000,000 events, but I only need to count just one of these unique events at the host-level. Since I'm using the tstats command first to retrieve data, I made sure that indeces exist on _time, host, address, and server. My problem here is that Splunk first retreives all of the matching events and then it removes the duplicates. Is there a way to just retreive unique events by host, address, and server?

For example, a host could have the following events over the past 30 days:

_time	host	address	server
2021-07-13 12:55:08	testenv1	10.10.10.10	store1
2021-07-13 12:55:08	testenv1	10.10.10.10	store1
2021-07-13 12:55:08	testenv1	10.10.10.10	store1
2021-07-13 12:55:08	testenv2	10.10.10.11	store2
2021-07-13 12:55:08	testenv2	10.10.10.11	store2
2021-07-13 12:55:08	testenv2	10.10.10.11	store2

And I want my query to do this:

_time	host	address	server
2021-07	testenv1	10.10.10.10	store1
2021-07	testenv2	10.10.10.11	store2

This is just a sample of my data. In several cases, we have unique hosts that repeat 20,000 times over a hour time span. I need my Splunk query to display this record just once, without having to retreive all other 20,000 events.

I also tried to use disctinct_counts like this, but this still retrieves all of the duplicated events under the Events tab:

| tstats distinct_count WHERE (index=<my_index> sourcetype=json_data earliest=-30d latest=-0h) BY _time span=1mon, host, address, server

I've browsed multiple Splunk threads and I'm just stumped.

Thank you.

Re: Counting Unique Events Over 30 Days

codebuilder — Wed, 14 Jul 2021 19:28:54 GMT

You can use dedup to filter your results.

https://docs.splunk.com/Documentation/Splunk/8.2.1/SearchReference/Dedup

Re: Counting Unique Events Over 30 Days

dodge27 — Wed, 14 Jul 2021 21:26:26 GMT

With the dedup command, I tried this:

| tstats count WHERE index=<my_index> sourcetype=json_data earliest=-30d latest=-0h BY host, address, server | dedup host, address, server | where isnotnull(address) AND host!="(none)" | stats sum BY host, address, server

But the issue persists where Splunk goes through more than 600,000,000 events first and then it proceeds to dedup the data. My issue is very similar to:

https://community.splunk.com/t5/Splunk-Search/How-can-I-limit-the-number-of-events-returned-from-a-s...

Somebody else at work mentioned that Splunk has the capability for distributed search jobs but I'm not quite sure how to do that.

Re: Counting Unique Events Over 30 Days

codebuilder — Wed, 14 Jul 2021 21:53:50 GMT

"But the issue persists where Splunk goes through more than 600,000,000 events first and then it proceeds to dedup the data. My issue is very similar to:"

I guess I'm not sure what you are trying to resolve then. Your search returns 600M matching events and you're running a dedup on the result set, so that is expected behavior. Splunk can't dedup fields before getting the fields/events to evaluate.

Also, tstats are extremely fast so this shouldn't be a performance concern.