Splunk Search

Distinct Count Query

shonky
New Member

I've looked around for answers on this, but unfortunately I've not found an answer to date. I have a list of data, but some of this is duplicate and as such I need to remove a whole row rather than just that value itself, ie:

Store Value 2 Value 3

Store 1 Device_1 Black

Store 1 Device_2 Black

Store 1 Device_3 ----

Store 1 Device_1 Black

What I would like to do is chart this so that I have:

Store Count of Value 2 (Unique) Count of Value 3
Store 1 3 2

When I use

... | chart dc(value 2), count (value 3) by store

It omits only the duplicate value in Value 2, but my count in value 3 is too high, ie:

Store Count of Value 2 (Unique) Count of Value 3
Store 1 3 3


Hi - Sorry, I think the "---" is actually empty in my data. So that part should be okay.

If I extend the values for two more line:

Store Value 2 Value 3
Store 1 Device_1 Black

Store 1 Device_2 Black

Store 1 Device_3

Store 1 Device_1 Black

Store 1 Device_1 Black

Store 1 Device_4 Black

If I exported the data to Excel, I would initially filter all of the results so that I only had unique values in Value 2:

Store Value 2 Value 3
Store 1 Device_1 Black

Store 1 Device_2 Black

Store 1 Device_3

Store 1 Device_4 Black

I would then run a pivot table which would show me the following:

           Value 2      Value 3

Store 1 4 3

So essentially, I want to remove any duplicate "lines" based on Value 2.

I hope that makes more sense. Sorry though.

Tags (2)
0 Karma

emiller42
Motivator

So you're telling Splunk to give you a distinct count of Value 2, which is does. (There are 3 distinct values) and a count of all items in Value 3, which is does. (I'm assuming the '----' is actually NULL in your records, so again there are 3 values)

What I'm not sure about is what you want the count to be for Value 3. Do you want a count of all records (what your query asks for) or a distinct count? (What your expected result set shows) I'm not sure why you're not using dc for both counts.

Now, if the '----' is actually an empty field, it won't be included in the count. So perhaps you're expecting it to be included? (That would cause a dc(Value 3) to return 2 like your expected results) If so, you can fillnull to give all nulls some value which would then be counted.

... | fillnull value=NULL "value 3" | chart dc(value 2) dc(value 3) by store
0 Karma

emiller42
Motivator

If it can have multiple values, then you can simply

... | dedup store "value 2" "value 3" | chart count(value 2) count(value 3) by store

and that should get what you're looking for.

0 Karma

shonky
New Member

I was going to say no, that it can have more than 1 answer. But thinking about it, I think you may be right if I limit my time span to a short enough period. I will give it a go and come back and rate your answer. I think it should work. My own fault for not figuring this one out.

0 Karma

emiller42
Motivator

So I'm assuming each Value 2 can only have one possible Value 3? (Including null) If so, the dedup command would fix this for you.

... | dedup "value 2" | chart count(value 2) count(value 3)

Would get you what you're looking for in the example.

0 Karma

shonky
New Member

I updated my question. I think I need to do a filter on my data before I count it - ie remove the unique values based on value 2 and then do the count. Just not sure how to remove the duplicate values first.

0 Karma

shonky
New Member

Because the the Value 2 is a duplicate.

Essentially I only want to count the unique devices. Line 4 - Store 1, Device 1, Black - has already showed in my logs.

Think of it as being like a repeat visitor to a website. It might show his IP address and then other information about the visitor. I don't want to count the other information twice as the visitor has already been there.

0 Karma

Ayn
Legend

In your sample data, I see 3 lines with a value for Value 3, so I don't follow why you would like the count to be 2?

0 Karma
Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...