Solved: How to left join ext data to event and perform row...

krbalaji77 · ‎04-05-2023

I have a lookup table with an event name with min max thresholds

I need to join this (left on the lookup) with the event count by with null fill on events not present in search

lastly - I need rowwise comparison of event count against min / max and conditional format coloring rows with counts out of band.

I am able to left join the data but I am unable to proceed beyond that as I am not able to reference the attributes for any additional evals

Any help or direction would be greatly appreciate:

| inputlookup bk_lookup.csv
| join type=left left=L right=R where L.alertCode = R.alertCode [search index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count as invokes by alertCode]
| table L.alertCode, R.invokes, L.min, L.max
| fillnull value=0 R.invokes

yuanliu · ‎04-12-2023

So, let's start with @woodcock's code and build in SPL.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode ``` output min_threshold max_threshold ```
| stats count as invokes by alertCode min_threshold max_threshold
| eval in_threshold = if(min_threshold > invokes OR invokes > max_threshold, "no", "yes")

This accomplishes the first half of your requirement, namely

Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

(I think eventName is the same as alertCode in your code illustration.) The second half is

Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

You are correct that some sort of join is necessary here. But a simple left join is insufficient. A common trick is to append, then do another stats.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count by alertCode
| append
    [| inputlookup bk_lookup.csv]
| stats values(count) as invokes values(*_threshold) as *_threshold by alertCode
| eval in_threshold = case(isnull(invokes), "n/a", min_threshold > invokes OR invokes > max_threshold, "no", true(), "yes")
| fillnull invokes value=0 ``` any alertCode not in events are given value 0 ```

When using append + stats with inputlookup, lookup command is often unnecessary because stats groupby takes care of the matching.

Note: The in_threshold calculation is valid only if every alertCode has only one entry in bk_lookup.csv.

Hope this helps.

View solution in original post

krbalaji77 · ‎04-10-2023

Thank you @woodcock for your response. I am still unable to reference other fields(like min and max threshold for each alertcode) from the excel in the context of evaluation against count by alertcode. This is where I have been struggling quite a bit.

yuanliu · ‎04-11-2023

@woodcock is right. Do not think in terms of join, or any SQL operation. Maybe you can describe the actual use case/application with illustrative data and desired output. Splunk usually has a better way than emulating SQL.

krbalaji77 · ‎04-11-2023

Thank you @yuanliu . I am clear from what @woodcock mentioned in terms of use of avoiding use of joins.

Here's is what I am looking for.

I have external data in lookup with a list of eventName, min_threshold, max_threshold

From splunk source events, I am doing inline rex to extract the eventName field

Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

This is why I was looking at left Join, but even beyond that - I am struggling on how to perform the rowwise comparison on the count stats

yuanliu · ‎04-12-2023

So, let's start with @woodcock's code and build in SPL.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode ``` output min_threshold max_threshold ```
| stats count as invokes by alertCode min_threshold max_threshold
| eval in_threshold = if(min_threshold > invokes OR invokes > max_threshold, "no", "yes")

This accomplishes the first half of your requirement, namely

Then I would like to do a count on the eventName and check if it is outside the min/max threshold for that particular eventName from the lookup file

(I think eventName is the same as alertCode in your code illustration.) The second half is

Additionally, I would like my count table to display eventCount as "0" and not meeting threshold for eventNames in the look up data that is not available in source events.

You are correct that some sort of join is necessary here. But a simple left join is insufficient. A common trick is to append, then do another stats.

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| stats count by alertCode
| append
    [| inputlookup bk_lookup.csv]
| stats values(count) as invokes values(*_threshold) as *_threshold by alertCode
| eval in_threshold = case(isnull(invokes), "n/a", min_threshold > invokes OR invokes > max_threshold, "no", true(), "yes")
| fillnull invokes value=0 ``` any alertCode not in events are given value 0 ```

When using append + stats with inputlookup, lookup command is often unnecessary because stats groupby takes care of the matching.

Note: The in_threshold calculation is valid only if every alertCode has only one entry in bk_lookup.csv.

Hope this helps.

krbalaji77 · ‎04-12-2023

Thanks a bunch @yuanliu . This is perfect. I was not aware of the values function. Also I had the threshold named min and max that kind a of added to the confusion as well.

@woodcock - Thank you for the clear direction as well.

Appreciate both your help.

woodcock · ‎04-11-2023

Read my answer. Run the searches. It is ALL there. If you need `min()` or `max()` then just add those to the `stats` command.

woodcock · ‎04-05-2023

You are thinking about it wrong (like SQL). See this post for the full nitty-gritty (don't forget the karma):
https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi...

So in your case, you would start EITHER like this:

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| lookup bk_lookup.csv alertCode

OR like this:

index=my_index log_group="/my/log/group" "*cache*"
| rex field=event.message "alertCode: (?<alertCode>.*), version: (?<version>.*)"
| inputlookup append=true bk_lookup.csv
| eval sourcetype=coalesce(sourcetype, "bk_lookup")

Then complete like shown in the other post.
DO NOT EVER USE JOIN.

How to left join ext data to event and perform rowwise eval?

eval

join

lookup

stats

table

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?