Solved: Re: Joining 2 Lookup Tables

weidertc · ‎01-16-2019

I'm trying to join 2 lookup tables. To make the logic easy to read, I want the first table to be the one whose data is higher up in hierarchy.

| inputlookup Applications.csv 
| fields AppNo, Application 
| join type=inner AppNo 
    [| inputlookup Functionalities.csv 
    | fields AppNo, FuncNo, Functionality]

This will pull all 4 rows in Applications.csv, and only 4 rows in Functionalities.csv. This will only ever pull 4 total rows, even if Functionalities.csv contains 30 matching AppNo field values. There are many more AppNo field values with a match in both tables, but only 4 are pulled. Since join and type are dangerously similar to SQL, why does this not behave like SQL? I can only get all 30 rows when i switch the two lookup tables around, which shouldn't matter since it's an inner join.

| inputlookup Functionalities.csv 
| fields AppNo, FuncNo, Functionality 
| join type=inner AppNo 
    [| inputlookup Applications.csv 
    | fields AppNo, Application ]

The first 1 is much easier to read for anyone who comes after me, especially since I have 2 more input tables to join. Trying to read the lowest layer first is weird. Am I doing something incorrectly, or does this really not work?

Applications.csv

AppNo, Application
1, app1
2, app2
3, app3

Functionalities.csv

AppNo,FuncNo,Functionality
1, 1, func1
1, 2, func2
2, 1, func3
2, 2, func4
2, 3, func5

Desired output

AppNo, Application, FuncNo, Functionality
1, app1, 1, func1
1, app1, 2, func2
2, app2, 1, func3
2, app2, 2, func4
2, app2, 3, func5

Chris

woodcock · ‎01-21-2019

I still do not get exactly why the lookup is not adequate, but here is another way, without lookup in run-anywhere code:

| makeresults 
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
    | strcat FuncNo "," Functionality multifield
    | fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

Yours would be like this:

| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]

| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

View solution in original post

woodcock · ‎01-21-2019

I still do not get exactly why the lookup is not adequate, but here is another way, without lookup in run-anywhere code:

| makeresults 
| eval raw="1,app1 2,app2 3,app3"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<Application>[^,]+)$"
| eval which="Applications.csv"
| appendpipe [
|makeresults
| eval raw = "1,1,func1 1,2,func2 2,1,func3 2,2,func4 2,3,func5"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex "^(?<AppNo>[^,]+),(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
    | strcat FuncNo "," Functionality multifield
    | fields - Application
| eval which="Functionalities.csv"]
| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

Yours would be like this:

| inputlookup Applications.csv
| appendpipe [
| inputlookup Functionalities.csv
| strcat FuncNo "," Functionality multifield
| fields - Application]

| stats values(multifield) AS multifield values(Application) AS Application BY AppNo
| mvexpand multifield
| rex field=multifield "^(?<FuncNo>[^,]+),(?<Functionality>[^,]+)$"
| fields - multifield

weidertc · ‎01-23-2019

This is it. Thanks! It works. As fast as my other method too, albeit a bit more complex, it avoids the join, which is good.

woodcock · ‎01-18-2019

You don't need to merge, you just need a vanilla lookup like this:

inputlookup Functionalities.csv
| lookup Applications.csv AppNo

weidertc · ‎01-21-2019

This is interesting. Like the first solution, however, this places all functionalities on a single row. i need them all on separate rows. Can we use lookup but create one row per functionality? The result set here feeds into another query, and that result set into another, so they need to retain their independence.

woodcock · ‎01-16-2019

Like this:

|inputlookup Applications.csv 
| appendpipe [ |inputlookup Functionalities.csv ]
| stats values(*) AS * BY AppNo

Or perhaps you might like to end with either

| stats first(*) AS * BY AppNo

Or:

| stats list(*) AS * BY AppNo

weidertc · ‎01-16-2019

This is clever. I didn't know about appendpipe. Unfortunately, though, this places all functionalities in a single row. I need them all in each their own rows.

learned something new. thanks!

woodcock · ‎01-16-2019

You were doing an inner join with AppNo; that is exactly what my solution does. Whatever you are trying to do, using join is the wrong approach in Splunk. Show a few lines of each file and the desired merged output and I will tweak my answer.

weidertc · ‎01-17-2019

I have updated my answer to give sample data for the first 2 tables. The other 2 are linearly below these in the hierarchy but not necessary to get this working.

Why do you state join is a wrong approach? Under what circumstances would it be a correct approach?

woodcock · ‎01-18-2019

Join is never the correct approach. It has inescapable limits that means it always breaks down (silently) under scale/load. Never ever use it.

weidertc · ‎01-21-2019

Would the same apply to anything that has inescapable limits, like map too? That has limits that I have reached.

I need some solution that maintains the lower table's row-independence.

Joining 2 Lookup Tables

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!