(I am reposting this question from email, with permission from the person who emailed)
I need to basically join 3 indexes where the ‘join’ info is in a 4th index. The 3 former indexes have around 50000 entries while the 4th index around 500000.
The fields in the indexes are:
Indexes containing the data: Index A, B, C, …: name status id
Index containing relationships: Index REL: id parent child
The parent and child values in REL match the id value in A, B and C.
And also note that the id values in A, B and C never collide, in other words the "single id space" of the REL events has no overlaps across the entities in A, B, C
From A to B is normally a 1 to many relation, and from B to C is a many-to-many relation.
The specific use case here is that A represents Applications, B represents Deployments of those applications, and C represents Servers implementing those deployments.
A server can be used for several deployments hence the many-to-many relation here.
The obvious way would be to do something like:
index=A
| rename id as A_id, name as Aname, status as Astatus
| join max=0 type=left A_id
[ | search index=REL | rename parent as A_id, child as B_id ]
| join max=0 type=left B_id
[ | search index=B
| rename is as B_id, name as Bname, status as Bstatus ]
| join max=0 type=left B_id
[ | search index=REL | rename parent as B_id, child as C_id ]
| join max=0 type=left C_id
[ | search index=C
| rename id as C_id, name as Cname, status as Cstatus ]
| table Aname Astatus Bname Bstatus Cname Cstatus
This, of course, fails miserably because the join only returns 10000 results while the REL index has 400000 events…
I can rewrite the first join as:
index IN(A REL)
| eval parent=if(index=REL, parent, id),
child =if(index=REL, child, id)
| stats values(name) as Aname values(statu) as Astatus values(child) as childs
by parent
| table Aname Astatus childs
But I’m at a loss how to squeeze in the other indexes and relation…
And I also have some hope that there's a way to avoid join entirely.
UPDATE: Here is a run-anywhere search to fabricate some sample input rows
| makeresults
| fields - _time
| eval data="1,A,appl1,,;2,A,appl2,,;3,D,depl1,,;4,D,depl2,,;5,D,depl3,,;6,S,serv1,,;7,S,serv2,,;8,S,serv3,,;9,S,serv4,,;10,R,,1,3;11,R,,2,4;12,R,,2,5;13,R,,3,6;14,R,,4,7;15,R,,5,8;16,R,,5,9",
data=split(data, ";")
| mvexpand data
| rex field=data "(?<sys_id>[^,]*),(?<type>[^,]*),(?<name>[^,]*),(?<parent>[^,]*),(?<child>[^,]*)"
| fields sys_id type name parent child
| eval parent=if(parent=="",null(),parent), child=if(child=="",null(),child)
And the desired output, is rows that map app1 to depl1 and serv1 and that map app2 to depl2,depl3 and serv2,serv3,serv4
(I'm the person that originally emailed the question to sideview.)
Thanks all for the good discussion.
I did do an implementation using lookup tables. It looks much simpler but you depend on creating the lookups in the first place:
index=A ...
| outputlookup A.csv
before combining the data:
| inputlookup A.csv
| rename id as A_id, name as A_name
| lookup REL.csv parent as A_id output child as B_id
| where isnotnull(B_id)
| mvexpand B_id
| lookup B.csv id as B_id output name as B_name
| where isnotnull(B_name)
| lookup REL.csv parent as B_id output child as C_id
| where isnotnull(C_id)
| mvexpand C_id
| lookup C.csv id as C_id output name as C_name
| mvexpand C_name
| fields - A_id B_id C_id
| sort 0 A_name B_name C_name
Although it duplicates possibly a lot of data, this seems easier to understand and to extend to even more relationships.
Simple to understand and simple to support are good decisions.
Well done.
I see this as a nontrivial version of Splunk soup. I'd proceed like this...
index=A OR index=REL or index=B OR index=C
| fields index parent child name status id
| rename COMMENT as "double the REL records, levaing the others single"
| eval myfan=mvrange(0,if(index="REL",2,1))
| mvexpand myfan
| rename COMMENT as "set up match keys and data fields"
| eval A_id=case(index="A",id, index="REL" AND myfan=0,parent)
| eval B_id=case(index="B",id, index="REL" AND myfan=0,child, index="REL" AND myfan=1,parent)
| eval C_id=case(index="C",id, index="REL" AND myfan=1,child)
| eval A_name=case(index="A",name)
| eval A_status=case(index="A",status)
| eval B_name_status=case(index="B",name."!!!!".status)
| eval C_name_status=case(index="C",name."!!!!".status)
At this point records look like this
index=A id name status A_id A_name A_status
index=B id name status B_id B_name_status
index=C id name status C_id C_name_status
index=REL myfan=0 parent child A_id B_id
index=REL myfan=1 parent child B_id C_id
Then I'd proceed like this...
| rename COMMENT as "reduce to required fields with one of these two"
| fields - id name status parent child
| fields index myfan A_id A_name A_status B_id B_name_status C_id C_name_status
| rename COMMENT as "roll data from REL myfan 0 to A, then myfan=1 to A, tehn drop REL"
| eventstats values(eval(case(myfan=0,B_id)) as B_id by A_id
| eventstats values(eval(case(myfan=1,C_id)) as C_id by B_id
| where index!="REL"
| rename COMMENT as "now we have only A, B, C records, and the A records have all relevant keys."
| rename COMMENT as "Roll B record to A then drop B"
| eventstats values(B_name_status) as B_name_status by B_id
| where index!="B"
| rename COMMENT as "Roll C record to A then drop C"
| eventstats values(C_name_status) as C_name_status by C_id
| where index!="C"
| rename COMMENT as "Above could be a stats"
| rename COMMENT as "Add placeholders to handle potential NULLS"
| eval B_name_status=coalesce(B_name_status,"N/A!!!!N/A")
| eval C_name_status=coalesce(C_name_status,"N/A!!!!N/A")
| rename COMMENT as "split up the records, then the fields"
| mvexpand C_name_status
| mvexpand B_name_status
| eval B_name=mvindex(split(B_name_status,"!!!!"),0)
| eval B_status=mvindex(split(B_name_status,"!!!!"),1)
| eval C_name=mvindex(split(C_name_status,"!!!!"),0)
| eval C_status=mvindex(split(C_name_status,"!!!!"),1)
| rename COMMENT as "drop unneeded fields"
| table A_name A_status B_name B_status C_name C_status
That's all air code, so you'd have to shake it down with a small subset of the records before running the whole data set.
| makeresults
| fields - _time
| eval data="1,A,appl1,,;2,A,appl2,,;3,D,depl1,,;4,D,depl2,,;5,D,depl3,,;6,S,serv1,,;7,S,serv2,,;8,S,serv3,,;9,S,serv4,,;10,R,,1,3;11,R,,2,4;12,R,,2,5;13,R,,3,6;14,R,,4,7;15,R,,5,8;16,R,,5,9",
data=split(data, ";")
| mvexpand data
| rex field=data "(?<sys_id>[^,]*),(?<type>[^,]*),(?<name>[^,]*),(?<parent>[^,]*),(?<child>[^,]*)"
| fields sys_id type name parent child
| eval parent=if(parent=="",null(),parent), child=if(child=="",null(),child)
| eval application_id=case(type="A",sys_id, type="R",parent),
server_id=case(type=="S",sys_id,type="R",child),
application_name=case(type="A",name),
deployment_name=case(type="D",name),
server_name = case(type="S",name)
| eventstats values(server_name) as server_name by server_id
| eval deployment_id=case(type=="D",sys_id,type=="R" AND isnull(server_name), child, true(),parent)
| eventstats values(server_id) as server_id values(deployment_id) as deployment_id by application_id
| fillnull deployment_id value=""
| mvexpand deployment_id
| eventstats values(deployment_name) as deployment_name values(server_name) as server_name by deployment_id
| eventstats values(application_name) as application_name by application_id
| fields application_name deployment_name server_name
| stats values(*) as * by application_name