Solved: Recursively join events on child to parent fields ...

oshirnin · ‎04-21-2020

Hello, everybody!

I want to ask something that has already been asked several times but there is still no clear solution. My initial query gives me the set of events, each of these have child_id and parent_id fields. Sample data looks like this:

child_id | parent_id
********************
 null    |   A1
 null    |   B1
 A1      |   A2
 B1      |   B2
 A2      |   C1
 B2      |   C1
 C1      |   C2
 C2      |   D1
 C2      |   E1

So, the elements on the bottom of the hierarchy has their child_id = null. The depth of parent-child relationships is not known in advance. I wonder, how can I restore the these events into the hierarchy, so if I set a specific event my search would return to me only this event and all events which are parent events? For example:

If I search child_id=B2 I need to get two events for child_id=B2 (root) and child_id=B1 (1 child) as results
If I search child_id=C1 I need to get five events for child_id=C1 (root) and child_id=A2, child_id=B2, child_id=A1, child_id=B1 (4 childs) as results, etc.

In any words, I need to get chains from the initial data:

child_id | chain
****************
 A1      |   A1
 A2      |   A2 -> A1
 B1      |   B1
 B2      |   B2 -> B1
 C1      |   C1 -> A2 -> A1
 C1      |   C1 -> B2 -> B1
 C2      |   C2 -> C1 -> A2 -> A1
 C2      |   C2 -> C1 -> B2 -> B1
 D1      |   D1 -> C2 -> C1 -> A2 -> A1
 D1      |   D1 -> C2 -> C1 -> B2 -> B1
 E1      |   E1 -> C2 -> C1 -> A2 -> A1
 E1      |   E1 -> C2 -> C1 -> B2 -> B1

I tried to achieve this with transaction and map but no luck at the moment. Looks like I need a kind of recursion. Is it maybe possible to implement a recursion by search macro, pointing to itself?

AlekseiVasiliev · ‎04-23-2020

According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup

|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"] 
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"] 
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child

View solution in original post

to4kawa · ‎04-23-2020

| makeresults 
| eval _raw="child_id parent_id
  null     A1
  null       B1
  A1         A2
  B1         B2
  A2         C1
  B2         C1
  C1         C2
  C2         D1
  C2         E1" 
| multikv forceheader=1 
| table parent_id child_id
| eval AP1=parent_id,C1=child_id
| eval data=AP1.",".C1
| eventstats values(data) as data
| streamstats count as session
| mvexpand data
| eval C2=if(mvindex(split(data,","),0)=C1,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C3=if(mvindex(split(data,","),0)=C2,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C4=if(mvindex(split(data,","),0)=C3,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C5=if(mvindex(split(data,","),0)=C4,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| stats values(*) as * by session
| fields - data
| table session parent_id C*

I forced my way through it.
I can't use foreach and calculate the steps.
so, It's hard to make SPL.

AlekseiVasiliev · ‎04-23-2020

According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup

|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"] 
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"] 
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child

rmmiller · ‎04-27-2020

This is a brilliant solution to the problem!
I had been experimenting with trying to make the unrolling of the recursion more intelligent, but ran into limitation after limitation. As @to4kawa mentioned, foreach is nearly impossible to use in this situation.

Bravo, @AlekseiVasiliev19 !!

AlekseiVasiliev · ‎04-28-2020

@rmmiller thank you for your kind words

AlekseiVasiliev · ‎04-23-2020

A few thoughts to follow:
1. The maxsearches parameter can be set strictly by calculating the number of lines in the source data and passing the code generated on the basis of this value to the |map command via a subsearch
2. It is assumed that the iterations of the |map command are performed sequentially, but generally it is necessary to investigate the behavior for |map parallelization on a large amount of data. If it is not possible to control the order of iterations of the |map, you can think about launching them based on the schedule of some kind

to4kawa · ‎04-23-2020

It's a good idea to use a CSV to save the results.

ktugwell_splunk · ‎04-22-2020

Similar question asked here today.

I posted how it can be achieved on a small dataset using a scheduled lookup - but not sure how it would scale on larger datasets.

oshirnin · ‎04-23-2020

Hello! I checked your solution on the link provided, it's interesting, but it helps only if you build only 3-level depth chains. I need to build chains of undefined depth on my initial data. I put the table visualisation picture above, would you be so kind to take a look to get what I mean?

to4kawa · ‎04-22-2020

If you just look the chains, try apps
https://docs.splunk.com/Documentation/SankeyDiagram/1.3.0/SankeyDiagramViz/SankeyIntro

It's hard to make SPL.

oshirnin · ‎04-23-2020

Hello! Yes, at the visualisation stage there are several Splunk controls to build a tree, I personally like Network Diagram Viz https://splunkbase.splunk.com/app/4438/

But what if I need this tree inside the SPL to filter data? For example, if my users click a leaf on the tree I want to drill-down them to the same dashboard, but show only the leafs (children) on the selected level and below. Here is the question. It seems to me the only way is to write a custom python command, which is not limited to cycles or recursion inside it. My initial table is not very large, maybe up to 1000 rows.

Should it work?

rmmiller · ‎04-22-2020

For the table displaying your final result, you have a column named child_id, but shouldn't that be parent_id instead?

parent_id | chain
 ****************
  A1      |   A1
  A2      |   A2 -> A1
  B1      |   B1
  B2      |   B2 -> B1

oshirnin · ‎04-23-2020

Actually, it doesn't matter how to name the first column in this table. The main idea is to restore the whole chain (or chains in case of branching) of parent objects from the selected object.

rmmiller · ‎04-22-2020

Also, it looks like C2 is a child of both D1 and E1 in your example. Is that your intent?

oshirnin · ‎04-23-2020

Hello! No, I consider C2 to be the parent of both D1 and E1. Please, look at the table visualisation

Recursively join events on child to parent fields to build chains

Index This | How many sides does a circle have?

New This Month - Splunk Observability updates and improvements for faster ...

What's New in Splunk Cloud Platform 9.3.2411?