Archive
Highlighted

Why is my query for DC(Device) by Version returning duplicates?

Engager

I am working with a large amount of data with over 8 million devices. I am trying to distinct count the number of devices by their version number. Unfortunately, the query is returning duplicates because the devices can be found with multiple versions.

For Example: the data may look like this at 9 am...
Device: A Version 1
Device: B Version 1
Device: C Version 1
Device: D Version 1
Device: E Version 1

but on a deployment day, by 3 pm, it may look like this:
Device: A Version 2
Device: B Version 2
Device: C Version 1
Device: D Version 1
Device: E Version 1

So, my dc(device) by version over 24 hours returns:
Version 1 5 Devices
Version 2 2 Devices
For a total of 7 devices, even though there are actually only 5.

Without Using Dedup, how do I eliminate those duplicates?

0 Karma
Highlighted

Re: Why is my query for DC(Device) by Version returning duplicates?

SplunkTrust
SplunkTrust

Which version is (more) correct? The earlier one or the later?

Newest events only:

base search here ... | stats max(_time) AS latest BY Device Version | stats dc(Device) BY Version

Or be inclusive with values()

base search here ... | stats values(Version) BY Device
0 Karma
Highlighted

Re: Why is my query for DC(Device) by Version returning duplicates?

Engager

The goal would be that the results would show me how many devices there were per Version, but not count the duplicate devices. The total for my examples should equal 5 and not 7. the two examples are both in the events that could be found during a 24 hour window and the reason stats dc(device) by version ends up with duplicates.

I used your first example, and although it works, it takes a long time (192 seconds for a 4 hour window. I have to stay under 300 seconds for a24 hour window.

My current query only takes about 150 seconds, but has the duplicates during version releases.
here is my query: base search | fields device, version | stats dc(device) by version

0 Karma
Highlighted

Re: Why is my query for DC(Device) by Version returning duplicates?

Engager

I noticed that stats has an option dedup_splitvals but I can't seem to get it to work correctly.

0 Karma
Highlighted

Re: Why is my query for DC(Device) by Version returning duplicates?

Influencer

@bmcfar000 - Try this

| stats latest(version) as version by device| stats count(device) as device by version

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.