Solved: dedup on wildcard field from a lookup

Merryvor · ‎07-18-2024

Hello,

I would like to obtain a list of all domains that did NOT match my lookup file which is composed of wildcard domain

here is an example :

Lookup file

domain

*adobe.com*

*perdu.com*

Events

index=proxy | table dest

dest

acrobat.adobe.com

geo2.adobe.com

Result wanted

*perdu.com*

My request looks like this

index=proxy 
| dedup dest
| table dest 
| eval Observed=1
| append 
    [| inputlookup domain.csv 
    | rename domain as dest
    | eval Observed=0]
| stats max(Observed) as Observed by dest 
| where Observed=0

Obtained results :

*adobe.com*

*perdu.com*

because the request didn't count the lines acrobat.adobe.com and geo2.adobe.com as duplicates of *adobe.com*

So what I need is a way to dedup the events based on the dest field matched from the lookup, and rename the dedup dest value like the wildcard domain field in the lookup. This way, mid request I would have these results :

dest observed

*adobe.com* 1 ==> from the events

*adobe.com* 0 ==> from the lookup

*perdu.com* 0 ==> from the lookup

then stats(max) and where to get only the wildcard domains that never matched.

how could I achieve that ?

PickleRick · ‎07-18-2024

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

View solution in original post

PickleRick · ‎07-18-2024

Small remark - doing dedup on whole events before displaying only one field seems like a waste of resources.

But to the point.

<your search>
| lookup domain.csv domain AS dest OUTPUT domain AS match

Will give you a field called "match" containing the pattern from your lookup that matched your event. Now you can do your append inputlookup and stats.

EDIT: One caveat though - by default you will get _first_ match. Not all rows from the lookup that could potentially match your value.

Merryvor · ‎07-18-2024

@PickleRick

I have nothing created in the new field "match". I suspect this is because of the wildcard in the lookup field somehow.

I tried to had a lookup definition on the domain.csv to have WILDCARD(domain), no luck either

PickleRick · ‎07-18-2024

Without the WILDCARD match type the lookup won't work of course.

My test lookup contains just one column in which I have just one row so it's kinda trivial example but it works:

EDIT: I'm not sure if it will work if you call the lookup by the file name. It should work when you call it by lookup definition name.

Merryvor · ‎07-18-2024

yes seems fine now ^^

I used the lookup definition in the lookup command

thank you !

dedup on wildcard field from a lookup

New This Month - Splunk Observability updates and improvements for faster ...

What's New in Splunk Cloud Platform 9.3.2411?

Buttercup Games: Further Dashboarding Techniques (Part 6)