Splunk Search

How to get count of unique values by group?

indusbull
Explorer

Hi

I am working on query to retrieve count of unique host IPs by user and country. The country has to be grouped into Total vs Total Non-US. The final result would be something like below -

UserId, Total Unique Hosts, Total Non-US Unique Hosts
user1, 42, 54
user2, 23, 95

So far I have below query which works but its very slow. Is there any better and faster way to achieve desired result ? Thanks

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST | search Country!=US | stats estdc(HOST) as total_non_us by USERID
| join USERID type="left"
    [
       search index=customindex sourcetype=custom src
         | iplocation allfields=true lang=code HOST | search Country=US | stats estdc(HOST) as total_us by USERID
    ] 
| fillnull
| eval total = total_non_us + total_us
0 Karma
1 Solution

elliotproebstel
Champion

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

View solution in original post

elliotproebstel
Champion

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

niketn
Legend

@elliotproebstel, I would perform stats first and then apply iplocation to aggregated fields.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Geostats#Usage
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

index=customindex sourcetype=custom src 
| stats count BY USERID HOST 
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL) 
| stats dc(us_host) AS total_us, dc(non_us_host) AS total_non_us BY USERID
| addtotals row=t col=f

@indusbull, can you explain why you are trying to use estdc() and not dc()?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

milanpatel78
New Member

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.

0 Karma

indusbull
Explorer

@niketnilay I was using dc initially but since it was taking long time I decided to try estdc since splunk doc mentions that estdc can give better performance.

0 Karma

rushabh92
New Member

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column.  Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes. 

Tags (2)
0 Karma

elliotproebstel
Champion

Good point, thanks!

0 Karma
Get Updates on the Splunk Community!

Demo Day: Strengthen Your SOC with Splunk Enterprise Security 8.1

Today’s threat landscape is more complex than ever. Security operation centers (SOCs) are overwhelmed with ...

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...