Solved: How to get count of unique values by group?

indusbull · ‎03-16-2018

Hi

I am working on query to retrieve count of unique host IPs by user and country. The country has to be grouped into Total vs Total Non-US. The final result would be something like below -

UserId, Total Unique Hosts, Total Non-US Unique Hosts
user1, 42, 54
user2, 23, 95

So far I have below query which works but its very slow. Is there any better and faster way to achieve desired result ? Thanks

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST | search Country!=US | stats estdc(HOST) as total_non_us by USERID
| join USERID type="left"
    [
       search index=customindex sourcetype=custom src
         | iplocation allfields=true lang=code HOST | search Country=US | stats estdc(HOST) as total_us by USERID
    ] 
| fillnull
| eval total = total_non_us + total_us

elliotproebstel · ‎03-16-2018

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

View solution in original post

elliotproebstel · ‎03-16-2018

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

niketn · ‎03-16-2018

@elliotproebstel, I would perform stats first and then apply iplocation to aggregated fields.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Geostats#Usage
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

index=customindex sourcetype=custom src 
| stats count BY USERID HOST 
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL) 
| stats dc(us_host) AS total_us, dc(non_us_host) AS total_non_us BY USERID
| addtotals row=t col=f

@indusbull, can you explain why you are trying to use estdc() and not dc()?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

milanpatel78 · ‎10-23-2021

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.

indusbull · ‎03-16-2018

@niketnilay I was using dc initially but since it was taking long time I decided to try estdc since splunk doc mentions that estdc can give better performance.

rushabh92 · ‎08-06-2020

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.

elliotproebstel · ‎03-16-2018

Good point, thanks!

How to get count of unique values by group?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to get count of unique values by group?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits