Splunk Search

recursively join the same table?

Splunk Employee

Hi

I have a data set with parent ID and child ID in a same table.
I am looking for a search that produce the following.

A sample data:

`````` p c
--- ---
111 222
222 333
111 444
555 666
``````

and, the output I want to get is :

`````` p   c
--- ---
111 222,333
111 444
555 666
``````

I was looking at the search documentation and find selfjoin command, but I still can not get it work..

Any comment would be appreciated!

Tags (4)
1 Solution
Explorer

The solution given by somesoni2, I am afraid, is not recursive. Though not a completely recursive one,
I have managed to come up with a decent solution as follows.
You need repeat a line (n - 2) times in order to get n-deep tree from parent-child pairs.
I needed to start from the end nodes instead of arbitrary nodes
in order to avoid (partially) duplicate branches.
(Please pardon my formatting below. This is my first post.)

Splunk search to build trees from parent-child pairs

``````sourcetype=csv
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
| eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| sort t | table t | outputcsv "result.csv"
``````

Variables

``````p: parent
c: child
c2: temporary child
r: root
k: (join) key
e: end
t: tree
``````

Search Explanation

``````# --------------------
# [Initial Data Processing]
# --------------------
sourcetype=csv
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
| eval t=p." -> ".c | eval e=c | eval r=p | table t, r, e
# --------------------
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
# (1) Leave only parent-child pairs where the child is an end node
| eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e
# (2) Set p(arent) to r(oot), c(hild) to e(nd), and output t(ree), r(oot), e(nd)

# --------------------
# [Repeatable Part]
# --------------------
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
# You repeat this line n - 2 times to get n-deep tree
# --------------------
| eval k=r
# (1) Set r(oot) to k(ey)
| join type=left k [search sourcetype=csv | rename c as k ]
# (2) Join the output with the csv with c(hild) renamed as k(ey) based on k(ey)
| eval t=if(isnull(p), t, p." -> ".t)
# (3) Add p(arent), " -> " in front of t(ree) if p(arent) is not null
| eval r=p | table t, r, e
# (4) Set p(arent) to r(oot) and output t(ree), r(oot), e(nd)
# --------------------

# --------------------
# [Finishing Touch]
# --------------------
| sort t | table t | outputcsv "result.csv"
# --------------------
# Sort by t(ree) and output the result to result.csv
# --------------------
``````

Sample Tree structure

``````111 --> 222 --> 777 --> 555 --> 321
|       +-> 789
+-> 999
333 --> 444 --> 888 --> 234
+-> 666
123 --> 456 --> 567
987 --> 654
``````

parent-child.csv

``````p,c
111,222
222,777
333,444
777,555
333,666
444,888
222,999
123,456
987,654
777,789
555,321
888,234
456,567
``````

result.csv

``````t
"111 -> 222 -> 777 -> 555 -> 321"
"111 -> 222 -> 777 -> 789"
"111 -> 222 -> 789"
"111 -> 222 -> 999"
"123 -> 456 -> 567"
"333 -> 444 -> 888 -> 234"
"333 -> 666"
"987 -> 654"
``````
Explorer

The solution given by somesoni2, I am afraid, is not recursive. Though not a completely recursive one,
I have managed to come up with a decent solution as follows.
You need repeat a line (n - 2) times in order to get n-deep tree from parent-child pairs.
I needed to start from the end nodes instead of arbitrary nodes
in order to avoid (partially) duplicate branches.
(Please pardon my formatting below. This is my first post.)

Splunk search to build trees from parent-child pairs

``````sourcetype=csv
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
| eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
| sort t | table t | outputcsv "result.csv"
``````

Variables

``````p: parent
c: child
c2: temporary child
r: root
k: (join) key
e: end
t: tree
``````

Search Explanation

``````# --------------------
# [Initial Data Processing]
# --------------------
sourcetype=csv
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
| eval t=p." -> ".c | eval e=c | eval r=p | table t, r, e
# --------------------
| join type=left c [search sourcetype=csv | rename c as c2 | rename p as c ] | where isnull (c2)
# (1) Leave only parent-child pairs where the child is an end node
| eval t=p." -> ".c | eval r=p | eval e=c | table t, r, e
# (2) Set p(arent) to r(oot), c(hild) to e(nd), and output t(ree), r(oot), e(nd)

# --------------------
# [Repeatable Part]
# --------------------
| eval k=r | join type=left k [search sourcetype=csv | rename c as k ] | eval t=if(isnull(p), t, p." -> ".t) | eval r=p | table t, r, e
# You repeat this line n - 2 times to get n-deep tree
# --------------------
| eval k=r
# (1) Set r(oot) to k(ey)
| join type=left k [search sourcetype=csv | rename c as k ]
# (2) Join the output with the csv with c(hild) renamed as k(ey) based on k(ey)
| eval t=if(isnull(p), t, p." -> ".t)
# (3) Add p(arent), " -> " in front of t(ree) if p(arent) is not null
| eval r=p | table t, r, e
# (4) Set p(arent) to r(oot) and output t(ree), r(oot), e(nd)
# --------------------

# --------------------
# [Finishing Touch]
# --------------------
| sort t | table t | outputcsv "result.csv"
# --------------------
# Sort by t(ree) and output the result to result.csv
# --------------------
``````

Sample Tree structure

``````111 --> 222 --> 777 --> 555 --> 321
|       +-> 789
+-> 999
333 --> 444 --> 888 --> 234
+-> 666
123 --> 456 --> 567
987 --> 654
``````

parent-child.csv

``````p,c
111,222
222,777
333,444
777,555
333,666
444,888
222,999
123,456
987,654
777,789
555,321
888,234
456,567
``````

result.csv

``````t
"111 -> 222 -> 777 -> 555 -> 321"
"111 -> 222 -> 777 -> 789"
"111 -> 222 -> 789"
"111 -> 222 -> 999"
"123 -> 456 -> 567"
"333 -> 444 -> 888 -> 234"
"333 -> 666"
"987 -> 654"
``````
Communicator

Sorry, It's possible instead of write N times the same query, insert a loop?

Thank you

Explorer

Is there a loop function or functionality available in Splunk? I looked around, but could not find loop when I worked on my answer above.

Motivator

Unfortunately, there is no loop function. Maybe you could use "map" command, but probably it would be better to write a custom search command that does loop operation.

http://dev.splunk.com/view/python-sdk/SP-CAAAEU2

SplunkTrust

Try something like this

``````your base search | table parent, child | join type=left child [search your base search | rename child as child2 | rename parent as child]
| eval child=child.coalesce(",".child2,"") | table parent, child
``````

``````sourcetype=csv |join type= left c [search sourcetype=csv |stats delim="," values(c) as cvals by p |rename p as c]|eval c=c.coalesce(",".cvals,"") | table p,c