Splunk Search

recursively join the same table?

Splunk_Shinobi
Splunk Employee
Splunk Employee

Hi

I have a data set with parent ID and child ID in a same table.
I am looking for a search that produce the following.

A sample data:

 p c
--- ---
111 222
222 333
111 444
555 666

and, the output I want to get is :

 p   c
--- ---
111 222,333
111 444
555 666

I was looking at the search documentation and find selfjoin command, but I still can not get it work..

Any comment would be appreciated!
Thank you in advance.

1 Solution

rmasuoka
Explorer

The solution given by somesoni2, I am afraid, is not recursive. Though not a completely recursive one,
I have managed to come up with a decent solution as follows.
You need repeat a line (n - 2) times in order to get n-deep tree from parent-child pairs.
I needed to start from the end nodes instead of arbitrary nodes
in order to avoid (partially) duplicate branches.
(Please pardon my formatting below. This is my first post.)

Splunk search to build trees from parent-child pairs

sourcetype=csv 
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
  | eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e 
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
  | sort t | table t | outputcsv "result.csv"

Variables

p: parent
c: child
c2: temporary child
r: root
k: (join) key
e: end
t: tree

Search Explanation

# --------------------
# [Initial Data Processing]
# --------------------
sourcetype=csv 
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
  | eval t=p." -> ".c | eval e=c | eval r=p | table t, r, e 
# --------------------
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
# (1) Leave only parent-child pairs where the child is an end node
  | eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e 
# (2) Set p(arent) to r(oot), c(hild) to e(nd), and output t(ree), r(oot), e(nd)

# --------------------
# [Repeatable Part]
# --------------------
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
# You repeat this line n - 2 times to get n-deep tree
# --------------------
  | eval k=r 
# (1) Set r(oot) to k(ey)
  | join type=left k [search sourcetype=csv | rename c as k ] 
# (2) Join the output with the csv with c(hild) renamed as k(ey) based on k(ey)
  | eval t=if(isnull(p), t, p." -> ".t)
# (3) Add p(arent), " -> " in front of t(ree) if p(arent) is not null
  | eval r=p | table t, r, e 
# (4) Set p(arent) to r(oot) and output t(ree), r(oot), e(nd)
# --------------------

# --------------------
# [Finishing Touch]
# --------------------
  | sort t | table t | outputcsv "result.csv"
# --------------------
# Sort by t(ree) and output the result to result.csv
# --------------------

Sample Tree structure

111 --> 222 --> 777 --> 555 --> 321
            |       +-> 789
            +-> 999
333 --> 444 --> 888 --> 234
    +-> 666
123 --> 456 --> 567
987 --> 654

parent-child.csv

p,c
111,222
222,777
333,444
777,555
333,666
444,888
222,999
123,456
987,654
777,789
555,321
888,234
456,567

result.csv

t
"111 -> 222 -> 777 -> 555 -> 321"
"111 -> 222 -> 777 -> 789"
"111 -> 222 -> 789"
"111 -> 222 -> 999"
"123 -> 456 -> 567"
"333 -> 444 -> 888 -> 234"
"333 -> 666"
"987 -> 654"

View solution in original post

rmasuoka
Explorer

The solution given by somesoni2, I am afraid, is not recursive. Though not a completely recursive one,
I have managed to come up with a decent solution as follows.
You need repeat a line (n - 2) times in order to get n-deep tree from parent-child pairs.
I needed to start from the end nodes instead of arbitrary nodes
in order to avoid (partially) duplicate branches.
(Please pardon my formatting below. This is my first post.)

Splunk search to build trees from parent-child pairs

sourcetype=csv 
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
  | eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e 
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
  | sort t | table t | outputcsv "result.csv"

Variables

p: parent
c: child
c2: temporary child
r: root
k: (join) key
e: end
t: tree

Search Explanation

# --------------------
# [Initial Data Processing]
# --------------------
sourcetype=csv 
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
  | eval t=p." -> ".c | eval e=c | eval r=p | table t, r, e 
# --------------------
  | join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
# (1) Leave only parent-child pairs where the child is an end node
  | eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e 
# (2) Set p(arent) to r(oot), c(hild) to e(nd), and output t(ree), r(oot), e(nd)

# --------------------
# [Repeatable Part]
# --------------------
  | eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e 
# You repeat this line n - 2 times to get n-deep tree
# --------------------
  | eval k=r 
# (1) Set r(oot) to k(ey)
  | join type=left k [search sourcetype=csv | rename c as k ] 
# (2) Join the output with the csv with c(hild) renamed as k(ey) based on k(ey)
  | eval t=if(isnull(p), t, p." -> ".t)
# (3) Add p(arent), " -> " in front of t(ree) if p(arent) is not null
  | eval r=p | table t, r, e 
# (4) Set p(arent) to r(oot) and output t(ree), r(oot), e(nd)
# --------------------

# --------------------
# [Finishing Touch]
# --------------------
  | sort t | table t | outputcsv "result.csv"
# --------------------
# Sort by t(ree) and output the result to result.csv
# --------------------

Sample Tree structure

111 --> 222 --> 777 --> 555 --> 321
            |       +-> 789
            +-> 999
333 --> 444 --> 888 --> 234
    +-> 666
123 --> 456 --> 567
987 --> 654

parent-child.csv

p,c
111,222
222,777
333,444
777,555
333,666
444,888
222,999
123,456
987,654
777,789
555,321
888,234
456,567

result.csv

t
"111 -> 222 -> 777 -> 555 -> 321"
"111 -> 222 -> 777 -> 789"
"111 -> 222 -> 789"
"111 -> 222 -> 999"
"123 -> 456 -> 567"
"333 -> 444 -> 888 -> 234"
"333 -> 666"
"987 -> 654"

Federica_92
Communicator

Sorry, It's possible instead of write N times the same query, insert a loop?

Thank you

0 Karma

rmasuoka
Explorer

Is there a loop function or functionality available in Splunk? I looked around, but could not find loop when I worked on my answer above.

0 Karma

melonman
Motivator

Unfortunately, there is no loop function. Maybe you could use "map" command, but probably it would be better to write a custom search command that does loop operation.

http://dev.splunk.com/view/python-sdk/SP-CAAAEU2

0 Karma

somesoni2
Revered Legend

Try something like this

your base search | table parent, child | join type=left child [search your base search | rename child as child2 | rename parent as child] 
| eval child=child.coalesce(",".child2,"") | table parent, child

Updated answer

Try this

sourcetype=csv |join type= left c [search sourcetype=csv |stats delim="," values(c) as cvals by p |rename p as c]|eval c=c.coalesce(",".cvals,"") | table p,c
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...