Deployment Architecture

Why is my query for DC(Device) by Version returning duplicates?

bmcfar000
Engager

I am working with a large amount of data with over 8 million devices. I am trying to distinct count the number of devices by their version number. Unfortunately, the query is returning duplicates because the devices can be found with multiple versions.

For Example: the data may look like this at 9 am...
Device: A Version 1
Device: B Version 1
Device: C Version 1
Device: D Version 1
Device: E Version 1

but on a deployment day, by 3 pm, it may look like this:
Device: A Version 2
Device: B Version 2
Device: C Version 1
Device: D Version 1
Device: E Version 1

So, my dc(device) by version over 24 hours returns:
Version 1 5 Devices
Version 2 2 Devices
For a total of 7 devices, even though there are actually only 5.

Without Using Dedup, how do I eliminate those duplicates?

0 Karma

bmcfar000
Engager

I noticed that stats has an option dedup_splitvals but I can't seem to get it to work correctly.

0 Karma

Vijeta
Influencer

@bmcfar000 - Try this

| stats latest(version) as version by device| stats count(device) as device by version

0 Karma

Yorokobi
SplunkTrust
SplunkTrust

Which version is (more) correct? The earlier one or the later?

Newest events only:

base search here ... | stats max(_time) AS latest BY Device Version | stats dc(Device) BY Version

Or be inclusive with values()

base search here ... | stats values(Version) BY Device
0 Karma

bmcfar000
Engager

The goal would be that the results would show me how many devices there were per Version, but not count the duplicate devices. The total for my examples should equal 5 and not 7. the two examples are both in the events that could be found during a 24 hour window and the reason stats dc(device) by version ends up with duplicates.

I used your first example, and although it works, it takes a long time (192 seconds for a 4 hour window. I have to stay under 300 seconds for a24 hour window.

My current query only takes about 150 seconds, but has the duplicates during version releases.
here is my query: base search | fields device, version | stats dc(device) by version

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...