Splunk Search

Joining 2 Lookup Tables

weidertc
Communicator

I'm trying to join 2 lookup tables. To make the logic easy to read, I want the first table to be the one whose data is higher up in hierarchy.

| inputlookup Applications.csv 
| fields AppNo, Application 
| join type=inner AppNo 
    [| inputlookup Functionalities.csv 
    | fields AppNo, FuncNo, Functionality]

This will pull all 4 rows in Applications.csv, and only 4 rows in Functionalities.csv. This will only ever pull 4 total rows, even if Functionalities.csv contains 30 matching AppNo field values. There are many more AppNo field values with a match in both tables, but only 4 are pulled. Since join and type are dangerously similar to SQL, why does this not behave like SQL? I can only get all 30 rows when i switch the two lookup tables around, which shouldn't matter since it's an inner join.

| inputlookup Functionalities.csv 
| fields AppNo, FuncNo, Functionality 
| join type=inner AppNo 
    [| inputlookup Applications.csv 
    | fields AppNo, Application ]

The first 1 is much easier to read for anyone who comes after me, especially since I have 2 more input tables to join. Trying to read the lowest layer first is weird. Am I doing something incorrectly, or does this really not work?

Applications.csv

AppNo, Application
1, app1
2, app2
3, app3

Functionalities.csv

AppNo,FuncNo,Functionality
1, 1, func1
1, 2, func2
2, 1, func3
2, 2, func4
2, 3, func5

Desired output

AppNo, Application, FuncNo, Functionality
1, app1, 1, func1
1, app1, 2, func2
2, app2, 1, func3
2, app2, 2, func4
2, app2, 3, func5

Chris

Tags (2)
0 Karma
1 Solution

woodcock
Esteemed Legend

I still do not get exactly why the lookup is not adequate, but here is another way, without lookup in run-anywhere code:

| makeresults 
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
    | strcat FuncNo "," Functionality multifield
    | fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

Yours would be like this:

| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]

| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

View solution in original post

woodcock
Esteemed Legend

I still do not get exactly why the lookup is not adequate, but here is another way, without lookup in run-anywhere code:

| makeresults 
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
    | strcat FuncNo "," Functionality multifield
    | fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

Yours would be like this:

| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]

| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

weidertc
Communicator

This is it. Thanks! It works. As fast as my other method too, albeit a bit more complex, it avoids the join, which is good.

0 Karma

woodcock
Esteemed Legend

You don't need to merge, you just need a vanilla lookup like this:

inputlookup Functionalities.csv
| lookup Applications.csv AppNo

weidertc
Communicator

This is interesting. Like the first solution, however, this places all functionalities on a single row. i need them all on separate rows. Can we use lookup but create one row per functionality? The result set here feeds into another query, and that result set into another, so they need to retain their independence.

0 Karma

woodcock
Esteemed Legend

Like this:

|inputlookup Applications.csv 
| appendpipe [ |inputlookup Functionalities.csv ]
| stats values(*) AS * BY AppNo

Or perhaps you might like to end with either

| stats first(*) AS * BY AppNo

Or:

| stats list(*) AS * BY AppNo

weidertc
Communicator

This is clever. I didn't know about appendpipe. Unfortunately, though, this places all functionalities in a single row. I need them all in each their own rows.

learned something new. thanks!

0 Karma

woodcock
Esteemed Legend

You were doing an inner join with AppNo; that is exactly what my solution does. Whatever you are trying to do, using join is the wrong approach in Splunk. Show a few lines of each file and the desired merged output and I will tweak my answer.

0 Karma

weidertc
Communicator

I have updated my answer to give sample data for the first 2 tables. The other 2 are linearly below these in the hierarchy but not necessary to get this working.

Why do you state join is a wrong approach? Under what circumstances would it be a correct approach?

0 Karma

woodcock
Esteemed Legend

Join is never the correct approach. It has inescapable limits that means it always breaks down (silently) under scale/load. Never ever use it.

weidertc
Communicator

Would the same apply to anything that has inescapable limits, like map too? That has limits that I have reached.

I need some solution that maintains the lower table's row-independence.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...