Solved: What is the best concept for enrichment from multi...

grobendg · ‎12-02-2019

I want to enrich my resultset from one SPL with multiply columns from other fields.
I know map or joins can be used.
The problem is with map, it will limit the resultset further (instead of enrich) and hence you cannot use it mulitply times in one SPL.
The problem is with join, its a lot of code to write for something as simple as that. Map would be just oneline.

Let me phrase what I am trying to build. I am trying to build a generic macro or function or SPL which automatically enriches particular fields.

Like for example, get username from IP.
I have one username and two ways to the get IP.

I cannot just do
map search1 = "first spl1 enrich"
map search2 = "first spl2 enrich"
in the same SPL.

search1 or search2 might be empty.

What is the best concept for enrichment from multiply columns through multiple indices within one SPL?

What I am looking for is:
(PseudoCode)

index=events BAD
| table _time BADFIELD EnrichmentUser
eval User1 =  | map search="search index=.... src="$BADFIELD$" | eval EnrichedUsername = user | sort -_time | head 1 | table EnrichedUsername"
eval User2 =  | map search="search index=.... src="$BADFIELD$" | eval EnrichedUsername = user | sort -_time | head 1 | table EnrichedUsername"
if User1 != EMPTY
  EnrichmentUser = User1
if User2 != EMPTY
  EnrichmentUser = User2
| table _time EnrichmentUser

How do I achieve this with SPL ?

to4kawa · ‎12-02-2019

(index=events OR index=user1_index OR index=user2_index ) BAD
| eval src=coalesce(src,BADFIELD)
| stats first(_time) as _time, values(eval(if(index=user1_index,user,NULL))) as User1, values(eval(if(index=user2_index,user,NULL))) as User2 by src
| eval EnrichmentUser = coalesce(User1, User2)
| table _time EnrichmentUser

Hi, @grobendg

Maybe this is okay.

View solution in original post

woodcock · ‎12-02-2019

I am very unclear about exactly what you are trying to do but probably the way to do it is to create 2 separate searches. The first one is scheduled to run every X-minutes and creates/trues-up a time-based lookup mapping the 2 fields that change based on time (perhaps this is DHCP stuff?). Then, access this through a time-based lookup definition to enhance the original data set. The second ad-hoc search leverages the time-based lookup to figure out what the variable field mapping is based on the value of _time.

P.S. eval EnrichedUsername = user | sort -_time | head 1 is much more efficient as sort 1 - _time | rename user AS EnrichedUsername.

to4kawa · ‎12-02-2019

(index=events OR index=user1_index OR index=user2_index ) BAD
| eval src=coalesce(src,BADFIELD)
| stats first(_time) as _time, values(eval(if(index=user1_index,user,NULL))) as User1, values(eval(if(index=user2_index,user,NULL))) as User2 by src
| eval EnrichmentUser = coalesce(User1, User2)
| table _time EnrichmentUser

Hi, @grobendg

Maybe this is okay.

grobendg · ‎12-03-2019

This is getting on the right direction, getting everything than trying to filter down and adding all metadata meanwhile.
| index Z BAD
| table Z_BAD_SRC
I've index A with field "User1", needs to be compared with A.src = Z_BAD_SRC

I've index B with field "User2", needs to be compared with B.srcip = Z_BAD_SRC

How can I do that with eval, when the field names are different?
eval EnrichmentUser = coalesce(User1, User2, "unable to enrich")

I am unable to do the upper comparision for relation to catch the correct matching events.
The idea with the second guy posting is what I think splunk would like to do.

The idea is the following, I want to write hundreds of different SPLs for cases Like:
1) GetUserName FromIP
2) GetMac FromIP
3) GetASN FromIP
etc.etc. you have thousands of "enrichments". But the enrichments come from various sources.

How to build the perfect metadata model enrichment, without getting straight to Data-Models (because you cant just combine tham using field names, but rather you have to use relations, like time/relation/same text).

I want to be able to have many enrichments SPLs which get additionall fields to existing saved-searches, on scale.

Or better:
How to built the best auto-enrichment Splunk saved-searches which are using enrichments from various hundreds of other saved-searches and allow for generic overall enrichment? I know some other Product can do that, But its another Product.

Has someone done that with Splunk? Like build all the enrichments into that... Or better use sep. platform, e.g. because of the limited SPL language.

Automatic Field Lookups are not the way to get, because of performance problems with replication of the knowledge bundle.

How to build the best enrichment system/framework for Splunk ?
I need to have many saved-searches being enriched, automatically via fields or via SPL changes.

to4kawa · ‎12-06-2019

Hi, @grobendg

index=A OR index=B OR (index=Z BAD)
|eval src=coalesce(src,srcip,Z_BAD_SRC)
|eval User=coalesce(User1,User2)
|stats values(User) as EnrichmentUser count(eval(index=="Z")) as flag by src
|where flag==1

I think this is the solution for the original purpose.
Please ask other questions again.

grobendg · ‎12-09-2019

Thank you very much, accepting the answer!!!

grobendg · ‎12-03-2019

I think the best would be to use DB Connect 3 with a separate platform.
Anyone else has any ideas / tips ?

What is the best concept for enrichment from multiply columns through multiple indices within one SPL?

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)