I'm trying to join 2 lookup tables. To make the logic easy to read, I want the first table to be the one whose data is higher up in hierarchy.
| inputlookup Applications.csv
| fields AppNo, Application
| join type=inner AppNo
[| inputlookup Functionalities.csv
| fields AppNo, FuncNo, Functionality]
This will pull all 4 rows in Applications.csv, and only 4 rows in Functionalities.csv. This will only ever pull 4 total rows, even if Functionalities.csv contains 30 matching AppNo field values. There are many more AppNo field values with a match in both tables, but only 4 are pulled. Since join and type are dangerously similar to SQL, why does this not behave like SQL? I can only get all 30 rows when i switch the two lookup tables around, which shouldn't matter since it's an inner join.
| inputlookup Functionalities.csv
| fields AppNo, FuncNo, Functionality
| join type=inner AppNo
[| inputlookup Applications.csv
| fields AppNo, Application ]
The first 1 is much easier to read for anyone who comes after me, especially since I have 2 more input tables to join. Trying to read the lowest layer first is weird. Am I doing something incorrectly, or does this really not work?
Applications.csv
AppNo, Application
1, app1
2, app2
3, app3
Functionalities.csv
AppNo,FuncNo,Functionality
1, 1, func1
1, 2, func2
2, 1, func3
2, 2, func4
2, 3, func5
Desired output
AppNo, Application, FuncNo, Functionality
1, app1, 1, func1
1, app1, 2, func2
2, app2, 1, func3
2, app2, 2, func4
2, app2, 3, func5
Chris
I still do not get exactly why the lookup
is not adequate, but here is another way, without lookup
in run-anywhere code:
| makeresults
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| strcat FuncNo "," Functionality multifield
| fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield
Yours would be like this:
| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield
I still do not get exactly why the lookup
is not adequate, but here is another way, without lookup
in run-anywhere code:
| makeresults
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| strcat FuncNo "," Functionality multifield
| fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield
Yours would be like this:
| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield
This is it. Thanks! It works. As fast as my other method too, albeit a bit more complex, it avoids the join, which is good.
You don't need to merge
, you just need a vanilla lookup
like this:
inputlookup Functionalities.csv
| lookup Applications.csv AppNo
This is interesting. Like the first solution, however, this places all functionalities on a single row. i need them all on separate rows. Can we use lookup but create one row per functionality? The result set here feeds into another query, and that result set into another, so they need to retain their independence.
Like this:
|inputlookup Applications.csv
| appendpipe [ |inputlookup Functionalities.csv ]
| stats values(*) AS * BY AppNo
Or perhaps you might like to end with either
| stats first(*) AS * BY AppNo
Or:
| stats list(*) AS * BY AppNo
This is clever. I didn't know about appendpipe. Unfortunately, though, this places all functionalities in a single row. I need them all in each their own rows.
learned something new. thanks!
You were doing an inner join with AppNo
; that is exactly what my solution does. Whatever you are trying to do, using join
is the wrong approach in Splunk. Show a few lines of each file and the desired merged output and I will tweak my answer.
I have updated my answer to give sample data for the first 2 tables. The other 2 are linearly below these in the hierarchy but not necessary to get this working.
Why do you state join is a wrong approach? Under what circumstances would it be a correct approach?
Join is never the correct approach. It has inescapable limits that means it always breaks down (silently) under scale/load. Never ever use it.
Would the same apply to anything that has inescapable limits, like map too? That has limits that I have reached.
I need some solution that maintains the lower table's row-independence.