Hello, everybody!
I want to ask something that has already been asked several times but there is still no clear solution. My initial query gives me the set of events, each of these have child_id
and parent_id
fields. Sample data looks like this:
child_id | parent_id
********************
null | A1
null | B1
A1 | A2
B1 | B2
A2 | C1
B2 | C1
C1 | C2
C2 | D1
C2 | E1
So, the elements on the bottom of the hierarchy has their child_id = null
. The depth of parent-child relationships is not known in advance. I wonder, how can I restore the these events into the hierarchy, so if I set a specific event my search would return to me only this event and all events which are parent events? For example:
search child_id=B2
I need to get two events for child_id=B2
(root) and child_id=B1
(1 child) as resultssearch child_id=C1
I need to get five events for child_id=C1
(root) and child_id=A2
, child_id=B2
, child_id=A1
, child_id=B1
(4 childs) as results, etc.In any words, I need to get chains from the initial data:
child_id | chain
****************
A1 | A1
A2 | A2 -> A1
B1 | B1
B2 | B2 -> B1
C1 | C1 -> A2 -> A1
C1 | C1 -> B2 -> B1
C2 | C2 -> C1 -> A2 -> A1
C2 | C2 -> C1 -> B2 -> B1
D1 | D1 -> C2 -> C1 -> A2 -> A1
D1 | D1 -> C2 -> C1 -> B2 -> B1
E1 | E1 -> C2 -> C1 -> A2 -> A1
E1 | E1 -> C2 -> C1 -> B2 -> B1
I tried to achieve this with transaction
and map
but no luck at the moment. Looks like I need a kind of recursion. Is it maybe possible to implement a recursion by search macro, pointing to itself?
According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup
|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"]
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"]
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"]
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child
| makeresults
| eval _raw="child_id parent_id
null A1
null B1
A1 A2
B1 B2
A2 C1
B2 C1
C1 C2
C2 D1
C2 E1"
| multikv forceheader=1
| table parent_id child_id
| eval AP1=parent_id,C1=child_id
| eval data=AP1.",".C1
| eventstats values(data) as data
| streamstats count as session
| mvexpand data
| eval C2=if(mvindex(split(data,","),0)=C1,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C3=if(mvindex(split(data,","),0)=C2,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C4=if(mvindex(split(data,","),0)=C3,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| eval C5=if(mvindex(split(data,","),0)=C4,mvindex(split(data,","),1),NULL)
| stats values(*) as * by session
| mvexpand data
| stats values(*) as * by session
| fields - data
| table session parent_id C*
I forced my way through it.
I can't use foreach
and calculate the steps.
so, It's hard to make SPL.
According to your data and the picture that you attached, it can be said that you are trying to restore the inheritance hierarchy from data on relationships of the form "parent->child". I can suggest an implementation based on the |map command and iterations caching with the .csv lookup
|makeresults |fields - _time |eval child_id="null", parent_id="A1"
|append [|makeresults |fields - _time |eval child_id="null", parent_id="B1"]
|append [|makeresults |fields - _time |eval child_id="A1", parent_id="A2"]
|append [|makeresults |fields - _time |eval child_id="B1", parent_id="B2"]
|append [|makeresults |fields - _time |eval child_id="A2", parent_id="C1"]
|append [|makeresults |fields - _time |eval child_id="B2", parent_id="C1"]
|append [|makeresults |fields - _time |eval child_id="C1", parent_id="C2"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="D1"]
|append [|makeresults |fields - _time |eval child_id="C2", parent_id="E1"]
|rename child_id as parent parent_id as child
|eval line=child."<-".parent
|eventstats values(parent) as parents by child
|eval depth=1
|outputlookup tree.csv
|map maxsearches=100 search="|inputlookup tree.csv
|eval con=mvindex(split(line, \"<-\"), -1)
|join type=left con [|inputlookup tree.csv |rename child as con parents as parents_2 |fields con parents_2]
|fillnull parents_2 value=\"null\"
|makemv parents_2
|mvexpand parents_2
|eval line=line.\"<-\".parents_2
|eval depth=depth+1
|outputlookup tree.csv"
|eventstats max(depth) as max_depth
|where depth==max_depth
|eval line=rtrim(line, "<-null")."<-null"
|stats values(line) as lines by child
This is a brilliant solution to the problem!
I had been experimenting with trying to make the unrolling of the recursion more intelligent, but ran into limitation after limitation. As @to4kawa mentioned, foreach is nearly impossible to use in this situation.
Bravo, @AlekseiVasiliev19 !!
@rmmiller thank you for your kind words
A few thoughts to follow:
1. The maxsearches parameter can be set strictly by calculating the number of lines in the source data and passing the code generated on the basis of this value to the |map command via a subsearch
2. It is assumed that the iterations of the |map command are performed sequentially, but generally it is necessary to investigate the behavior for |map parallelization on a large amount of data. If it is not possible to control the order of iterations of the |map, you can think about launching them based on the schedule of some kind
It's a good idea to use a CSV to save the results.
Similar question asked here today.
I posted how it can be achieved on a small dataset using a scheduled lookup - but not sure how it would scale on larger datasets.
Hello! I checked your solution on the link provided, it's interesting, but it helps only if you build only 3-level depth chains. I need to build chains of undefined depth on my initial data. I put the table visualisation picture above, would you be so kind to take a look to get what I mean?
If you just look the chains, try apps
https://docs.splunk.com/Documentation/SankeyDiagram/1.3.0/SankeyDiagramViz/SankeyIntro
It's hard to make SPL.
Hello! Yes, at the visualisation stage there are several Splunk controls to build a tree, I personally like Network Diagram Viz https://splunkbase.splunk.com/app/4438/
But what if I need this tree inside the SPL to filter data? For example, if my users click a leaf on the tree I want to drill-down them to the same dashboard, but show only the leafs (children) on the selected level and below. Here is the question. It seems to me the only way is to write a custom python command, which is not limited to cycles or recursion inside it. My initial table is not very large, maybe up to 1000 rows.
Should it work?
For the table displaying your final result, you have a column named child_id, but shouldn't that be parent_id instead?
parent_id | chain
****************
A1 | A1
A2 | A2 -> A1
B1 | B1
B2 | B2 -> B1
Actually, it doesn't matter how to name the first column in this table. The main idea is to restore the whole chain (or chains in case of branching) of parent objects from the selected object.
Also, it looks like C2 is a child of both D1 and E1 in your example. Is that your intent?
Hello! No, I consider C2 to be the parent of both D1 and E1. Please, look at the table visualisation