Splunk Search

Recursively join events on child to parent fields to build chains

oshirnin
Path Finder

Hello, everybody!

I want to ask something that has already been asked several times but there is still no clear solution. My initial query gives me the set of events, each of these have child_id and parent_id fields. Sample data looks like this:

child_id | parent_id
********************
 null    |   A1
 null    |   B1
 A1      |   A2
 B1      |   B2
 A2      |   C1
 B2      |   C1
 C1      |   C2
 C2      |   D1
 C2      |   E1

So, the elements on the bottom of the hierarchy has their child_id = null. The depth of parent-child relationships is not known in advance. I wonder, how can I restore the these events into the hierarchy, so if I set a specific event my search would return to me only this event and all events which are parent events? For example:

  1. If I search child_id=B2 I need to get two events for child_id=B2 (root) and child_id=B1 (1 child) as results
  2. If I search child_id=C1 I need to get five events for child_id=C1 (root) and child_id=A2, child_id=B2, child_id=A1, child_id=B1 (4 childs) as results, etc.

In any words, I need to get chains from the initial data:

child_id | chain
****************
 A1      |   A1
 A2      |   A2 -> A1
 B1      |   B1
 B2      |   B2 -> B1
 C1      |   C1 -> A2 -> A1
 C1      |   C1 -> B2 -> B1
 C2      |   C2 -> C1 -> A2 -> A1
 C2      |   C2 -> C1 -> B2 -> B1
 D1      |   D1 -> C2 -> C1 -> A2 -> A1
 D1      |   D1 -> C2 -> C1 -> B2 -> B1
 E1      |   E1 -> C2 -> C1 -> A2 -> A1
 E1      |   E1 -> C2 -> C1 -> B2 -> B1

I tried to achieve this with transaction and map but no luck at the moment. Looks like I need a kind of recursion. Is it maybe possible to implement a recursion by search macro, pointing to itself?

0 Karma
1 Solution

AlekseiVasiliev
Explorer

According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup

|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"] 
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"] 
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child

View solution in original post

to4kawa
Ultra Champion
| makeresults 
| eval _raw="child_id parent_id
  null     A1
  null       B1
  A1         A2
  B1         B2
  A2         C1
  B2         C1
  C1         C2
  C2         D1
  C2         E1" 
| multikv forceheader=1 
| table parent_id child_id
| eval AP1=parent_id,C1=child_id
| eval data=AP1.",".C1
| eventstats values(data) as data
| streamstats count as session
| mvexpand data
| eval C2=if(mvindex(split(data,","),0)=C1,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C3=if(mvindex(split(data,","),0)=C2,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C4=if(mvindex(split(data,","),0)=C3,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C5=if(mvindex(split(data,","),0)=C4,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| stats values(*) as * by session
| fields - data
| table session parent_id C*

I forced my way through it.
I can't use foreach and calculate the steps.
so, It's hard to make SPL.

0 Karma

AlekseiVasiliev
Explorer

According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup

|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"] 
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"] 
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"] 
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child

rmmiller
Contributor

This is a brilliant solution to the problem!
I had been experimenting with trying to make the unrolling of the recursion more intelligent, but ran into limitation after limitation. As @to4kawa mentioned, foreach is nearly impossible to use in this situation.

Bravo, @AlekseiVasiliev19 !!

0 Karma

AlekseiVasiliev
Explorer

@rmmiller thank you for your kind words

0 Karma

AlekseiVasiliev
Explorer

A few thoughts to follow:
1. The maxsearches parameter can be set strictly by calculating the number of lines in the source data and passing the code generated on the basis of this value to the |map command via a subsearch
2. It is assumed that the iterations of the |map command are performed sequentially, but generally it is necessary to investigate the behavior for |map parallelization on a large amount of data. If it is not possible to control the order of iterations of the |map, you can think about launching them based on the schedule of some kind

0 Karma

to4kawa
Ultra Champion

It's a good idea to use a CSV to save the results.

0 Karma

ktugwell_splunk
Splunk Employee
Splunk Employee

Similar question asked here today.

I posted how it can be achieved on a small dataset using a scheduled lookup - but not sure how it would scale on larger datasets.

0 Karma

oshirnin
Path Finder

Hello! I checked your solution on the link provided, it's interesting, but it helps only if you build only 3-level depth chains. I need to build chains of undefined depth on my initial data. I put the table visualisation picture above, would you be so kind to take a look to get what I mean?

0 Karma

to4kawa
Ultra Champion

If you just look the chains, try apps
https://docs.splunk.com/Documentation/SankeyDiagram/1.3.0/SankeyDiagramViz/SankeyIntro

It's hard to make SPL.

0 Karma

oshirnin
Path Finder

Hello! Yes, at the visualisation stage there are several Splunk controls to build a tree, I personally like Network Diagram Viz https://splunkbase.splunk.com/app/4438/

But what if I need this tree inside the SPL to filter data? For example, if my users click a leaf on the tree I want to drill-down them to the same dashboard, but show only the leafs (children) on the selected level and below. Here is the question. It seems to me the only way is to write a custom python command, which is not limited to cycles or recursion inside it. My initial table is not very large, maybe up to 1000 rows.

Should it work?

0 Karma

rmmiller
Contributor

For the table displaying your final result, you have a column named child_id, but shouldn't that be parent_id instead?

parent_id | chain
 ****************
  A1      |   A1
  A2      |   A2 -> A1
  B1      |   B1
  B2      |   B2 -> B1
0 Karma

oshirnin
Path Finder

Actually, it doesn't matter how to name the first column in this table. The main idea is to restore the whole chain (or chains in case of branching) of parent objects from the selected object.

0 Karma

rmmiller
Contributor

Also, it looks like C2 is a child of both D1 and E1 in your example. Is that your intent?

0 Karma

oshirnin
Path Finder

Hello! No, I consider C2 to be the parent of both D1 and E1. Please, look at the table visualisation

alt text

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...