Modified Sankey visualization for path analysis

jmurata · ‎05-11-2022

Hi, I don't know is this question was previously addressed by the users who asked about multi-stage Sankey diagrams or user flow displaying (classical marketing scenario of web-users navigating in a webshop from a start page to the final cart page and spotting the drop-out locations). It is though a valid diagramming scenario and has been made very popular by various analytics platform like Teradata or Qlik and has been also named as Path Sankey by various developers (https://github.com/DaltonRuer/PathSankey). The QlikSense implementations are also based on modified versions of d3.js similar to the Splunk app. The closest request that someone posted here is maybe here https://community.splunk.com/t5/All-Apps-and-Add-ons/How-to-search-and-aggregate-user-behavior-data-... (since I am a beginner Splunker it migt be the same topic) Basically it extends the two-layers source-target concept of standard d3.js to source-multiple_layers-target. The best example could be seen here https://bl.ocks.org/jeinarsson/e37aa55c3b0e11ae6fa1 and one can imagine that the number of layers / nodes can be actually limited only by the CPU power and RAM (although javascript limitations exist in almost all browsers). A practical example (from my field of interest)) would be this: Suppose that we have a hospital with five units thorough which the patient must pass (not mandatory through all of them) and we want to see the patient referral flow between the doctors from this units; we would have for example 1000 patient IDs and for each of them we would have various flows based on referral from the first unit doctor to the last one he sees (of course not necessarily in the alphabetical order and not always five referrals) so we would display 5 layers in the Sankey chart, each layer displaying in a vertical manner the corresponding doctor names of the unit as nodes with a node thickness according to the number of incoming links from the previous linking nodes equal to count(patiend_id). It would be the same as https://bl.ocks.org/jeinarsson/e37aa55c3b0e11ae6fa1 but with 5 layers and an variable number of nodes according to the inputlookup set. If anybody knows a way how to tweak the current Sankey app search

| inputlookup referrals.csv

| stats count(patient_id) by 1st_Reffering_Layer 2nd_Referring_Layer

---maybe ??---

| stats count(patient_id) by 2st_Reffering_Layer 3rd_Referring_Layer

???

| stats count(patient_id) by 3rd_Reffering_Layer 4th_Referring_Layer

???

| stats count(patient_id) by 4th_Reffering_Layer 5th_Referring_Layer

If the solution from https://community.splunk.com/t5/All-Apps-and-Add-ons/How-to-search-and-aggregate-user-behavior-data-... is exactly what I am asking for above please advise.

Thank you

Modified Sankey visualization for path analysis

chart

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation

Modified Sankey visualization for path analysis

chart

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Announcing Modern Navigation: A New Era of Splunk User Experience