I have a dataset where each event summarizes a workflow, using the fields Foo->Bar->Baz, and I'm looking to create a Sankey diagram to visualize the flow. The only way I've come up with to get the output I want is to run one search, do a stats call, and then append the same query with a different stats call, like:
index=myIndex | stats count BY Foo, Bar | rename Foo AS source, Bar AS target | append [search index=myIndex | stats count BY Bar, Baz | rename Bar AS source, Baz AS target]
This works, but it's incredibly inefficient, and MUCH slower than it needs to be. Is there a way to get the output I'm looking for with a single search that I'm missing?
The output table would look something like:
source | target | count
foo1 | bar1 | 3
foo1 | bar2 | 12
bar1 | baz1 | 1
bar1 | baz2 | 2
bar2 | baz1 | 12
If you can count by all three fields, maybe using appendpipe would be less resource intensive than using append:
sourcetype="access_combined"
| stats count by host categoryId product_name
| appendpipe [stats count by host categoryId | rename host as source, categoryId as target]
| appendpipe [stats count by categoryId product_name | rename categoryId as source, product_name as target]
| search source=*
| fields source target count
gives me
Hi @doweaver . @aljohnson_splun @fulldanad A newbie question, I posted a thread at https://community.splunk.com/t5/Dashboards-Visualizations/Modified-Sankey-visualization-for-path-ana... regarding (IMHO) the same issue as described above. I would like to replicate the final solution to check if I could apply it to my task but I can't create the dataset (external or inline) required for this search:
sourcetype="access_combined"
| table host categoryId product_name
| appendpipe [stats count by host categoryId | rename host as source, categoryId as target]
| appendpipe [stats count by categoryId product_name | rename categoryId as source, product_name as target]
| search source=*
| fields source target count
could you help re-assemble it with a minimum number of lines to replicate the solution? BTW, Is it working on the sankey 1.6.0 app (the last version)?
Thanks a lot
If you can count by all three fields, maybe using appendpipe would be less resource intensive than using append:
sourcetype="access_combined"
| stats count by host categoryId product_name
| appendpipe [stats count by host categoryId | rename host as source, categoryId as target]
| appendpipe [stats count by categoryId product_name | rename categoryId as source, product_name as target]
| search source=*
| fields source target count
gives me
Hi aljohnson. I want to thank you very much for this solution. I applied it on my problem and it worked very well. Well done.
Hmm - I tried to post your comment as the answer, but Splunk is saying I can't make more than 2 posts per day until I hit 40 points. Pretty sure I've only made one post today, but...
/shrug
If you paste that same thing as the answer, I'll mark it solved 🙂
Hi aljohnson,
Thanks for your answer, it would greatly help to have it integrated in the documentation...
Find below a little amendment that helps to size correctly the lines :
sourcetype="access_combined"
| table host categoryId product_name
| appendpipe [stats count by host categoryId | rename host as source, categoryId as target]
| appendpipe [stats count by categoryId product_name | rename categoryId as source, product_name as target]
| search source=*
| fields source target count
Glad it worked. Converted 🙂
Yes! Perfect!
Didn't realize appendpipe was a thing. Thanks for your help!
...I have no idea why a random "5." is showing up in the middle of the table...
Cool question @doweaver. How many distinct values are there of foo bar and baz? As a solution for dc(foo) = 2 might be a lot simpler than all of those distinct values being an unknown variable.
There are probably ~5 distinct values for each.
I'm not sure I understand what you're getting at here:
As a solution for dc(foo) = 2 might be a lot simpler than all of those distinct values being an unknown variable.
Sorry, that wasn't well worded. I just meant that if there is a smaller number of distinct values, you might be able to get a simpler answer (I'm more thinking out loud haha, sorry).
So obviously foo and bar occur together, and bar and baz occur together, but do foo and baz NOT occur together, that is, is there a reason you can't search
index=myIndex | stats count by foo bar baz
No worries 🙂
Unfortunately, they all three occur in a single event 😞 Technically, it's a transaction that links A -> B, with A containing Foo, and B containing Bar and Baz. I don't THINK there's a way to split things up in a way that will make that work... but I'll keep thinking about that.
Hi @doweaver
That's just automatic numbering with anything in code blocks so people can help users point out where they've identified errors in syntax when people are sharing multiple lines of sample data/code.
Oh, that makes sense 🙂 That was the best way I could figure out to put in a table (HTML table markup didn't seem to work).
heh yeah, that's the best way to display a table format on here. you're doin it right 🙂