Splunk Dev

dedup on wildcard field from a lookup

Merryvor
Explorer

Hello, 

I would like to obtain a list of all domains that did NOT match my lookup file which is composed of wildcard domain

here is an example :

Lookup file

domain

*adobe.com*

*perdu.com*

Events 

index=proxy | table dest

dest

acrobat.adobe.com

geo2.adobe.com

Result wanted 

*perdu.com*

My request looks like this 

index=proxy 
| dedup dest
| table dest 
| eval Observed=1
| append 
    [| inputlookup domain.csv 
    | rename domain as dest
    | eval Observed=0]
| stats max(Observed) as Observed by dest 
| where Observed=0

Obtained results :

*adobe.com*

*perdu.com*

because the request didn't count the lines acrobat.adobe.com and geo2.adobe.com as duplicates of *adobe.com*

So what I need is a way to dedup the events based on the dest field matched from the lookup, and rename the dedup dest value like the wildcard domain field in the lookup. This way, mid request I would have these results :

dest                              observed

*adobe.com*             1                                               ==> from the events

*adobe.com*              0                                              ==> from the lookup

*perdu.com*             0                                                ==> from the lookup

then stats(max) and where to get only the wildcard domains that never matched.

how could I achieve that ?

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

0 Karma

Merryvor
Explorer

@PickleRick 

I have nothing created in the new field "match". I suspect this is because of the wildcard in the lookup field somehow.

I tried to had a lookup definition on the domain.csv to have WILDCARD(domain), no luck either

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Without the WILDCARD match type the lookup won't work of course.

My test lookup contains just one column in which I have just one row so it's kinda trivial example but it works:

PickleRick_0-1721295647116.png

EDIT: I'm not sure if it will work if you call the lookup by the file name. It should work when you call it by lookup definition name.

0 Karma

Merryvor
Explorer

yes seems fine now ^^

I used the lookup definition in the lookup command

thank you !

0 Karma
Get Updates on the Splunk Community!

New This Month - Splunk Observability updates and improvements for faster ...

What’s New? This month, we’re delivering several enhancements across Splunk Observability Cloud for faster and ...

What's New in Splunk Cloud Platform 9.3.2411?

Hey Splunky People! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2411. This release ...

Buttercup Games: Further Dashboarding Techniques (Part 6)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...