Hi team,
I have user behavior data like below in splunk, and I want to create a query which can aggregate the user behavior data in a httpsessionID , and then visualize the data by Sankey chart for understanding the user flow.
timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Serach
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report
From above data, I want to create a Sankey chart like below, Please advise how to write the query. Thank you!
| makeresults
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report"
| multikv forceheader=1
| table timestamp, sessionID, Customer, UserID, module, page, behavior
| rename COMMENT as "from here, the logic"
| eventstats list(behavior) as list_behavior by sessionID
| eventstats list(page) as list_page by sessionID
| appendpipe
[| eventstats values(module) as start values(page) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID]
| streamstats count
| eval tmp=mvzip(count,mvzip(start,end))
| stats count by tmp
| sort tmp
| eval tmp=replace(tmp,"\d+\,","")
| dedup tmp
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1)
| eventstats count(end) as count by end
| eval end=if(start==end,end.".",end)
| table start end count
Hi, @cheriemilk
with | eval end=if(start==end,end.".",end)
, since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.
| makeresults
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report"
| multikv forceheader=1
| table timestamp, sessionID, Customer, UserID, module, page, behavior
| rename COMMENT as "from here, the logic"
| eventstats list(behavior) as list_behavior by sessionID
| eventstats list(page) as list_page by sessionID
| appendpipe
[| eventstats values(module) as start values(page) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID]
| appendpipe
[| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID]
| streamstats count
| eval tmp=mvzip(count,mvzip(start,end))
| stats count by tmp
| sort tmp
| eval tmp=replace(tmp,"\d+\,","")
| dedup tmp
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1)
| eventstats count(end) as count by end
| eval end=if(start==end,end.".",end)
| table start end count
Hi, @cheriemilk
with | eval end=if(start==end,end.".",end)
, since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.
Hi, @to4kawa ,
Thank you for the proposal. I add prefix for each step actions to avoid situation that the start and end action could be the same.
I met another problem now. When I run the query with real data in splunk and append 10 steps in a sessionID. I found the nodes are discorded which make the Sankey chart hard to read, instead of the expected order: step1->step2->step3 etc.. Please refer to my screenshot. Is there any way to avoid this?
@to4kawa , Please refer the to attachment "Sankey nodes disordered" in the top of this thread.
hi @cheriemilk
Screen confirmed.
The order has been adjusted in various ways this time.
I won't know without actual queries and results.
I found that adding prefix for each step like below doesn't fix the data backflow issue actually
. for example below query doesn't create the correct sankey. (I removed the last event for userC)
| makeresults
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12:00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report"
| multikv forceheader=1
| table timestamp, sessionID, Customer, UserID, module, page, behavior
| eventstats list(behavior) as list_behavior by sessionID
| appendpipe
[| eventstats first(eval("step1".mvindex(list_behavior,0))) as start first(eval("step2".mvindex(list_behavior,1))) as end by UserID]
| appendpipe
[| eventstats first(eval("step2".mvindex(list_behavior,1))) as start first(eval("step3".mvindex(list_behavior,2))) as end by UserID]
| appendpipe
[| eventstats first(eval("step3".mvindex(list_behavior,2))) as start first(eval(if(isnull(mvindex(list_behavior,3), "step4".mvindex(list_behavior,3), "exit"))))) as end by UserID]
| streamstats count
| eval tmp=mvzip(count,mvzip(start,end))
| stats count by tmp
| sort tmp
| eval tmp=replace(tmp,"\d+\,","")
| dedup tmp
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1)
| eventstats count(end) as count by end
| eval end=if(start==end,end.".",end)
| table tmp start end count