All Apps and Add-ons

How to search and aggregate user behavior data in a httpsessionID and visualize the data by Sankey

cheriemilk
Path Finder

alt textHi team,

I have user behavior data like below in splunk, and I want to create a query which can aggregate the user behavior data in a httpsessionID , and then visualize the data by Sankey chart for understanding the user flow.

timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Serach
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report

From above data, I want to create a Sankey chart like below, Please advise how to write the query. Thank you!
alt text

0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults 
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report" 
| multikv forceheader=1 
| table timestamp, sessionID, Customer, UserID, module, page, behavior 
| rename COMMENT as "from here, the logic" 
| eventstats list(behavior) as list_behavior by sessionID 
| eventstats list(page) as list_page by sessionID 
| appendpipe 
    [| eventstats values(module) as start values(page) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID] 
| streamstats count 
| eval tmp=mvzip(count,mvzip(start,end)) 
| stats count by tmp 
| sort tmp 
| eval tmp=replace(tmp,"\d+\,","") 
| dedup tmp 
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1) 
| eventstats count(end) as count by end 
| eval end=if(start==end,end.".",end) 
| table start end count

Hi, @cheriemilk
with | eval end=if(start==end,end.".",end) , since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.

View solution in original post

0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report" 
| multikv forceheader=1 
| table timestamp, sessionID, Customer, UserID, module, page, behavior 
| rename COMMENT as "from here, the logic" 
| eventstats list(behavior) as list_behavior by sessionID 
| eventstats list(page) as list_page by sessionID 
| appendpipe 
    [| eventstats values(module) as start values(page) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID] 
| streamstats count 
| eval tmp=mvzip(count,mvzip(start,end)) 
| stats count by tmp 
| sort tmp 
| eval tmp=replace(tmp,"\d+\,","") 
| dedup tmp 
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1) 
| eventstats count(end) as count by end 
| eval end=if(start==end,end.".",end) 
| table start end count

Hi, @cheriemilk
with | eval end=if(start==end,end.".",end) , since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.

0 Karma

cheriemilk
Path Finder

Hi, @to4kawa ,

Thank you for the proposal. I add prefix for each step actions to avoid situation that the start and end action could be the same.

I met another problem now. When I run the query with real data in splunk and append 10 steps in a sessionID. I found the nodes are discorded which make the Sankey chart hard to read, instead of the expected order: step1->step2->step3 etc.. Please refer to my screenshot. Is there any way to avoid this?

0 Karma

cheriemilk
Path Finder

@to4kawa , Please refer the to attachment "Sankey nodes disordered" in the top of this thread.

0 Karma

to4kawa
Ultra Champion

hi @cheriemilk
Screen confirmed.
The order has been adjusted in various ways this time.
I won't know without actual queries and results.

0 Karma

cheriemilk
Path Finder

I found that adding prefix for each step like below doesn't fix the data backflow issue actually
. for example below query doesn't create the correct sankey. (I removed the last event for userC)

| makeresults
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12:00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report"
| multikv forceheader=1
| table timestamp, sessionID, Customer, UserID, module, page, behavior
| eventstats list(behavior) as list_behavior by sessionID
| appendpipe
[| eventstats first(eval("step1".mvindex(list_behavior,0))) as start first(eval("step2".mvindex(list_behavior,1))) as end by UserID]
| appendpipe
[| eventstats first(eval("step2".mvindex(list_behavior,1))) as start first(eval("step3".mvindex(list_behavior,2))) as end by UserID]
| appendpipe
[| eventstats first(eval("step3".mvindex(list_behavior,2))) as start first(eval(if(isnull(mvindex(list_behavior,3), "step4".mvindex(list_behavior,3), "exit"))))) as end by UserID]
| streamstats count
| eval tmp=mvzip(count,mvzip(start,end))
| stats count by tmp
| sort tmp
| eval tmp=replace(tmp,"\d+\,","")
| dedup tmp
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1)
| eventstats count(end) as count by end
| eval end=if(start==end,end.".",end)
| table tmp start end count

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...