Solved: How to search and aggregate user behavior data in ...

cheriemilk · ‎01-12-2020

Hi team,

I have user behavior data like below in splunk, and I want to create a query which can aggregate the user behavior data in a httpsessionID , and then visualize the data by Sankey chart for understanding the user flow.

timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Serach
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report

From above data, I want to create a Sankey chart like below, Please advise how to write the query. Thank you!

to4kawa · ‎01-12-2020

| makeresults 
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report" 
| multikv forceheader=1 
| table timestamp, sessionID, Customer, UserID, module, page, behavior 
| rename COMMENT as "from here, the logic" 
| eventstats list(behavior) as list_behavior by sessionID 
| eventstats list(page) as list_page by sessionID 
| appendpipe 
    [| eventstats values(module) as start values(page) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID] 
| streamstats count 
| eval tmp=mvzip(count,mvzip(start,end)) 
| stats count by tmp 
| sort tmp 
| eval tmp=replace(tmp,"\d+\,","") 
| dedup tmp 
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1) 
| eventstats count(end) as count by end 
| eval end=if(start==end,end.".",end) 
| table start end count

Hi, @cheriemilk
with | eval end=if(start==end,end.".",end) , since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.

View solution in original post

to4kawa · ‎01-12-2020

| makeresults 
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12: 00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report
12:00:03, SID3, CustomerC, UserC, moduleA, pageC, Update Report" 
| multikv forceheader=1 
| table timestamp, sessionID, Customer, UserID, module, page, behavior 
| rename COMMENT as "from here, the logic" 
| eventstats list(behavior) as list_behavior by sessionID 
| eventstats list(page) as list_page by sessionID 
| appendpipe 
    [| eventstats values(module) as start values(page) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_page,0))) as start first(eval(mvindex(list_behavior,0))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,0))) as start first(eval(mvindex(list_behavior,1))) as end by UserID] 
| appendpipe 
    [| eventstats first(eval(mvindex(list_behavior,1))) as start first(eval(mvindex(list_behavior,2))) as end by UserID] 
| streamstats count 
| eval tmp=mvzip(count,mvzip(start,end)) 
| stats count by tmp 
| sort tmp 
| eval tmp=replace(tmp,"\d+\,","") 
| dedup tmp 
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1) 
| eventstats count(end) as count by end 
| eval end=if(start==end,end.".",end) 
| table start end count

Hi, @cheriemilk
with | eval end=if(start==end,end.".",end) , since the start and end are the same, the indications is consolidated.
You have to count a little more in order to display beautifully like the example.

cheriemilk · ‎01-13-2020

Hi, @to4kawa ,

Thank you for the proposal. I add prefix for each step actions to avoid situation that the start and end action could be the same.

I met another problem now. When I run the query with real data in splunk and append 10 steps in a sessionID. I found the nodes are discorded which make the Sankey chart hard to read, instead of the expected order: step1->step2->step3 etc.. Please refer to my screenshot. Is there any way to avoid this?

cheriemilk · ‎01-13-2020

@to4kawa , Please refer the to attachment "Sankey nodes disordered" in the top of this thread.

to4kawa · ‎01-14-2020

hi @cheriemilk
Screen confirmed.
The order has been adjusted in various ways this time.
I won't know without actual queries and results.

cheriemilk · ‎01-14-2020

I found that adding prefix for each step like below doesn't fix the data backflow issue actually
. for example below query doesn't create the correct sankey. (I removed the last event for userC)

| makeresults
| eval _raw="timestamp, sessionID, Customer, UserID, module, page, behavior
12:00:00, SID1, CustomerA, UserA, moduleA, pageA, Open PageA
12:00:01, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:02, SID1, CustomerA, UserA, moduleA, pageA, Search
12:00:01, SID2, CustomerB, UserB, moduleA, pageA, Open PageA
12:00:02, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:03, SID2, CustomerB, UserB, moduleA, pageA, Search
12:00:01, SID3, CustomerC, UserC, moduleA, pageC, Open PageC
12:00:02, SID3, CustomerC, UserC, moduleA, pageC, Generate Report"
| multikv forceheader=1
| table timestamp, sessionID, Customer, UserID, module, page, behavior
| eventstats list(behavior) as list_behavior by sessionID
| appendpipe
[| eventstats first(eval("step1".mvindex(list_behavior,0))) as start first(eval("step2".mvindex(list_behavior,1))) as end by UserID]
| appendpipe
[| eventstats first(eval("step2".mvindex(list_behavior,1))) as start first(eval("step3".mvindex(list_behavior,2))) as end by UserID]
| appendpipe
[| eventstats first(eval("step3".mvindex(list_behavior,2))) as start first(eval(if(isnull(mvindex(list_behavior,3), "step4".mvindex(list_behavior,3), "exit"))))) as end by UserID]
| streamstats count
| eval tmp=mvzip(count,mvzip(start,end))
| stats count by tmp
| sort tmp
| eval tmp=replace(tmp,"\d+\,","")
| dedup tmp
| eval start=mvindex(split(tmp,","),0), end=mvindex(split(tmp,","),1)
| eventstats count(end) as count by end
| eval end=if(start==end,end.".",end)
| table tmp start end count

How to search and aggregate user behavior data in a httpsessionID and visualize the data by Sankey

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor