Splunk Search

How to left join ext data to event and perform rowwise eval?

krbalaji77
Explorer

I have a lookup table with an event name with min max thresholds

I need to join this (left on the lookup) with the event count by with null fill on events not present in search

lastly - I need rowwise comparison of event count against min / max and conditional format  coloring rows with counts out of band.

I am able to left join the data but I am unable to proceed beyond that as I am not able to reference the attributes for any additional evals

Any help or direction would be greatly appreciate:

| inputlookup bk_lookup.csv
| join type=left left=L right=R where L.alertCode = R.alertCode [search index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count as invokes by alertCode]
| table L.alertCode, R.invokes, L.min, L.max
| fillnull value=0 R.invokes

Labels (5)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

So, let's start with @woodcock's code and build in SPL.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode ``` output min_threshold max_threshold ```
| stats count as invokes by alertCode min_threshold max_threshold
| eval in_threshold = if(min_threshold > invokes OR invokes > max_threshold, "no", "yes")

This accomplishes the first half of your requirement, namely


Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

(I think eventName is the same as alertCode in your code illustration.) The second half is


Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

You are correct that some sort of join is necessary here.  But a simple left join is insufficient.  A common trick is to append, then do another stats.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count by alertCode
| append
    [| inputlookup bk_lookup.csv]
| stats values(count) as invokes values(*_threshold) as *_threshold by alertCode
| eval in_threshold = case(isnull(invokes), "n/a", min_threshold > invokes OR invokes > max_threshold, "no", true(), "yes")
| fillnull invokes value=0 ``` any alertCode not in events are given value 0 ```

When using append + stats with inputlookup, lookup command is often unnecessary because stats groupby takes care of the matching.

Note: The in_threshold calculation is valid only if every alertCode has only one entry in bk_lookup.csv.

Hope this helps.

View solution in original post

Tags (2)

krbalaji77
Explorer

Thank you @woodcock  for your response. I am still unable to reference other fields(like min and max threshold for each alertcode) from the excel in the context of evaluation against count by alertcode. This is where I have been struggling quite a bit. 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@woodcock is right.  Do not think in terms of join, or any SQL operation.  Maybe you can describe the actual use case/application with illustrative data and desired output.  Splunk usually has a better way than emulating SQL.

0 Karma

krbalaji77
Explorer

Thank you @yuanliu . I am clear from what @woodcock mentioned in terms of use of avoiding use of joins. 

Here's is what I am looking for.

I have external data in lookup with a list of eventName, min_threshold, max_threshold

From splunk source events, I am doing inline rex to extract the eventName field

Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

This is why I was looking at left Join, but even beyond that - I am struggling on how to perform the rowwise comparison on the count stats

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, let's start with @woodcock's code and build in SPL.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode ``` output min_threshold max_threshold ```
| stats count as invokes by alertCode min_threshold max_threshold
| eval in_threshold = if(min_threshold > invokes OR invokes > max_threshold, "no", "yes")

This accomplishes the first half of your requirement, namely


Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

(I think eventName is the same as alertCode in your code illustration.) The second half is


Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

You are correct that some sort of join is necessary here.  But a simple left join is insufficient.  A common trick is to append, then do another stats.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count by alertCode
| append
    [| inputlookup bk_lookup.csv]
| stats values(count) as invokes values(*_threshold) as *_threshold by alertCode
| eval in_threshold = case(isnull(invokes), "n/a", min_threshold > invokes OR invokes > max_threshold, "no", true(), "yes")
| fillnull invokes value=0 ``` any alertCode not in events are given value 0 ```

When using append + stats with inputlookup, lookup command is often unnecessary because stats groupby takes care of the matching.

Note: The in_threshold calculation is valid only if every alertCode has only one entry in bk_lookup.csv.

Hope this helps.

Tags (2)

krbalaji77
Explorer

Thanks a bunch @yuanliu . This is perfect. I was not aware of the values function.  Also I had the threshold named min and max that kind a of added to the confusion as well.

@woodcock  - Thank you for the clear direction as well. 

Appreciate both your help.

0 Karma

woodcock
Esteemed Legend

Read my answer.  Run the searches.  It is ALL there.  If you need `min()` or `max()` then just add those to the `stats` command.

0 Karma

woodcock
Esteemed Legend

You are thinking about it wrong (like SQL).  See this post for the full nitty-gritty (don't forget the karma):
https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi...

So in your case, you would start EITHER like this:

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode

OR like this:

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| inputlookup append=true bk_lookup.csv
| eval sourcetype=coalesce(sourcetype, "bk_lookup")

Then complete like shown in the other post.
DO NOT EVER USE JOIN.

Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...