Splunk Search

How to dedup after | stats list(blah)?

Contributor

Scenario: I am extracting sender domains with the following code:

index=mail sourcetype=xemail

    [search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 

        |stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 

            |search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats list(domain) 

Problem: I am not able to find the correct dedup syntax so I only list one unique domain once.
Currently I see
domainA.tld
domain
A.tld
domainA.tld
domain
B.tld
domainC.tld
domain
A.tld

When I want just
domainA.tld
domain
B.tld
domain_C.tld

Please provide an example.

Thank you

Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi packet_hunter,

just use values() instead of list() in the last stats:

index=mail sourcetype=xemail
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 
|search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats values(domain) 

This will return uniq values for domain http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/CommonStatsFunctions .

values(X)   Returns the list of all distinct values of the field X as a multivalue entry. The order of the values is lexicographical.

Also, you may want to look at this answer https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... and try to use stats in your search, because subsearch have limits and are expensive and slow to run over a large data set.

Hop this helps ...

cheers, MuS

View solution in original post

SplunkTrust
SplunkTrust

Hi packet_hunter,

just use values() instead of list() in the last stats:

index=mail sourcetype=xemail
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 
|search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats values(domain) 

This will return uniq values for domain http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/CommonStatsFunctions .

values(X)   Returns the list of all distinct values of the field X as a multivalue entry. The order of the values is lexicographical.

Also, you may want to look at this answer https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... and try to use stats in your search, because subsearch have limits and are expensive and slow to run over a large data set.

Hop this helps ...

cheers, MuS

View solution in original post