Reporting

Counting Unique Events Over 30 Days

dodge27
Loves-to-Learn Lots

Hello,

How can I improve on my Splunk query so that only one event is counted over a 30-day span where we have 500,000,000 events matched?

This is the query I have so far:

 

 

| tstats count WHERE (index=<my_index> sourcetype=json_data earliest=-30d latest=-0h) BY _time span=1mon, host, address, server

 

 

This query returns approximately 600,000,000 events, but I only need to count just one of these unique events at the host-level. Since I'm using the tstats command first to retrieve data, I made sure that indeces exist on _time, host, address, and server. My problem here is that Splunk first retreives all of the matching events and then it removes the duplicates. Is there a way to just retreive unique events by host, address, and server?

For example, a host could have the following events over the past 30 days:

_timehostaddressserver
2021-07-13 12:55:08testenv110.10.10.10store1
2021-07-13 12:55:08testenv110.10.10.10store1
2021-07-13 12:55:08testenv110.10.10.10store1
2021-07-13 12:55:08testenv210.10.10.11store2
2021-07-13 12:55:08 testenv210.10.10.11store2
2021-07-13 12:55:08 testenv210.10.10.11store2

 

And I want my query to do this:

_timehostaddressserver
2021-07testenv110.10.10.10store1
2021-07 testenv210.10.10.11store2

 

This is just a sample of my data. In several cases, we have unique hosts that repeat 20,000 times over a hour time span. I need my Splunk query to display this record just once, without having to retreive all other 20,000 events.

I also tried to use disctinct_counts like this, but this still retrieves all of the duplicated events under the Events tab:

 

 

| tstats distinct_count WHERE (index=<my_index> sourcetype=json_data earliest=-30d latest=-0h) BY _time span=1mon, host, address, server

 

 

I've browsed multiple Splunk threads and I'm just stumped.

Thank you.

0 Karma

codebuilder
Influencer

You can use dedup to filter your results.

https://docs.splunk.com/Documentation/Splunk/8.2.1/SearchReference/Dedup

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

dodge27
Loves-to-Learn Lots

With the dedup command, I tried this:

| tstats count WHERE index=<my_index> sourcetype=json_data earliest=-30d latest=-0h BY host, address, server
| dedup host, address, server
| where isnotnull(address) AND host!="(none)"
| stats sum BY host, address, server

But the issue persists where Splunk goes through more than 600,000,000 events first and then it proceeds to dedup the data. My issue is very similar to:

https://community.splunk.com/t5/Splunk-Search/How-can-I-limit-the-number-of-events-returned-from-a-s...

Somebody else at work mentioned that Splunk has the capability for distributed search jobs but I'm not quite sure how to do that.

0 Karma

codebuilder
Influencer

"But the issue persists where Splunk goes through more than 600,000,000 events first and then it proceeds to dedup the data. My issue is very similar to:"

I guess I'm not sure what you are trying to resolve then. Your search returns 600M matching events and you're running a dedup on the result set, so that is expected behavior. Splunk can't dedup fields before getting the fields/events to evaluate.

Also, tstats are extremely fast so this shouldn't be a performance concern.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...