Deployment Architecture

Why is my query for DC(Device) by Version returning duplicates?

bmcfar000
Engager

I am working with a large amount of data with over 8 million devices. I am trying to distinct count the number of devices by their version number. Unfortunately, the query is returning duplicates because the devices can be found with multiple versions.

For Example: the data may look like this at 9 am...
Device: A Version 1
Device: B Version 1
Device: C Version 1
Device: D Version 1
Device: E Version 1

but on a deployment day, by 3 pm, it may look like this:
Device: A Version 2
Device: B Version 2
Device: C Version 1
Device: D Version 1
Device: E Version 1

So, my dc(device) by version over 24 hours returns:
Version 1 5 Devices
Version 2 2 Devices
For a total of 7 devices, even though there are actually only 5.

Without Using Dedup, how do I eliminate those duplicates?

0 Karma

bmcfar000
Engager

I noticed that stats has an option dedup_splitvals but I can't seem to get it to work correctly.

0 Karma

Vijeta
Influencer

@bmcfar000 - Try this

| stats latest(version) as version by device| stats count(device) as device by version

0 Karma

Yorokobi
SplunkTrust
SplunkTrust

Which version is (more) correct? The earlier one or the later?

Newest events only:

base search here ... | stats max(_time) AS latest BY Device Version | stats dc(Device) BY Version

Or be inclusive with values()

base search here ... | stats values(Version) BY Device
0 Karma

bmcfar000
Engager

The goal would be that the results would show me how many devices there were per Version, but not count the duplicate devices. The total for my examples should equal 5 and not 7. the two examples are both in the events that could be found during a 24 hour window and the reason stats dc(device) by version ends up with duplicates.

I used your first example, and although it works, it takes a long time (192 seconds for a 4 hour window. I have to stay under 300 seconds for a24 hour window.

My current query only takes about 150 seconds, but has the duplicates during version releases.
here is my query: base search | fields device, version | stats dc(device) by version

0 Karma
Get Updates on the Splunk Community!

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

 Ready to master Kubernetes and cloud monitoring like the pros? Join Splunk’s Growth Engineering team for an ...

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

To Community SOAR App Developers - we're reaching out with an important update regarding Python 3.9's ...

October Community Champions: A Shoutout to Our Contributors!

As October comes to a close, we want to take a moment to celebrate the people who make the Splunk Community ...