Splunk Dev

dedup on wildcard field from a lookup

Merryvor
Explorer

Hello, 

I would like to obtain a list of all domains that did NOT match my lookup file which is composed of wildcard domain

here is an example :

Lookup file

domain

*adobe.com*

*perdu.com*

Events 

index=proxy | table dest

dest

acrobat.adobe.com

geo2.adobe.com

Result wanted 

*perdu.com*

My request looks like this 

index=proxy 
| dedup dest
| table dest 
| eval Observed=1
| append 
    [| inputlookup domain.csv 
    | rename domain as dest
    | eval Observed=0]
| stats max(Observed) as Observed by dest 
| where Observed=0

Obtained results :

*adobe.com*

*perdu.com*

because the request didn't count the lines acrobat.adobe.com and geo2.adobe.com as duplicates of *adobe.com*

So what I need is a way to dedup the events based on the dest field matched from the lookup, and rename the dedup dest value like the wildcard domain field in the lookup. This way, mid request I would have these results :

dest                              observed

*adobe.com*             1                                               ==> from the events

*adobe.com*              0                                              ==> from the lookup

*perdu.com*             0                                                ==> from the lookup

then stats(max) and where to get only the wildcard domains that never matched.

how could I achieve that ?

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

0 Karma

Merryvor
Explorer

@PickleRick 

I have nothing created in the new field "match". I suspect this is because of the wildcard in the lookup field somehow.

I tried to had a lookup definition on the domain.csv to have WILDCARD(domain), no luck either

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Without the WILDCARD match type the lookup won't work of course.

My test lookup contains just one column in which I have just one row so it's kinda trivial example but it works:

PickleRick_0-1721295647116.png

EDIT: I'm not sure if it will work if you call the lookup by the file name. It should work when you call it by lookup definition name.

0 Karma

Merryvor
Explorer

yes seems fine now ^^

I used the lookup definition in the lookup command

thank you !

0 Karma
Get Updates on the Splunk Community!

Uncovering Multi-Account Fraud with Splunk Banking Analytics

Last month, I met with a Senior Fraud Analyst at a nationally recognized bank to discuss their recent success ...

Secure Your Future: A Deep Dive into the Compliance and Security Enhancements for the ...

What has been announced?  In the blog, “Preparing your Splunk Environment for OpensSSL3,”we announced the ...

New This Month in Splunk Observability Cloud - Synthetic Monitoring updates, UI ...

This month, we’re delivering several platform, infrastructure, application and digital experience monitoring ...