Splunk Dev

dedup on wildcard field from a lookup

Merryvor
Explorer

Hello, 

I would like to obtain a list of all domains that did NOT match my lookup file which is composed of wildcard domain

here is an example :

Lookup file

domain

*adobe.com*

*perdu.com*

Events 

index=proxy | table dest

dest

acrobat.adobe.com

geo2.adobe.com

Result wanted 

*perdu.com*

My request looks like this 

index=proxy 
| dedup dest
| table dest 
| eval Observed=1
| append 
    [| inputlookup domain.csv 
    | rename domain as dest
    | eval Observed=0]
| stats max(Observed) as Observed by dest 
| where Observed=0

Obtained results :

*adobe.com*

*perdu.com*

because the request didn't count the lines acrobat.adobe.com and geo2.adobe.com as duplicates of *adobe.com*

So what I need is a way to dedup the events based on the dest field matched from the lookup, and rename the dedup dest value like the wildcard domain field in the lookup. This way, mid request I would have these results :

dest                              observed

*adobe.com*             1                                               ==> from the events

*adobe.com*              0                                              ==> from the lookup

*perdu.com*             0                                                ==> from the lookup

then stats(max) and where to get only the wildcard domains that never matched.

how could I achieve that ?

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

0 Karma

Merryvor
Explorer

@PickleRick 

I have nothing created in the new field "match". I suspect this is because of the wildcard in the lookup field somehow.

I tried to had a lookup definition on the domain.csv to have WILDCARD(domain), no luck either

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Without the WILDCARD match type the lookup won't work of course.

My test lookup contains just one column in which I have just one row so it's kinda trivial example but it works:

PickleRick_0-1721295647116.png

EDIT: I'm not sure if it will work if you call the lookup by the file name. It should work when you call it by lookup definition name.

0 Karma

Merryvor
Explorer

yes seems fine now ^^

I used the lookup definition in the lookup command

thank you !

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...