Splunk Search

Process flow tracing, point to point latency calculation, visualisation (swim lanes?). Is it possible?


Say, I have three events.

2014/04/16 23:54:00,000 id=aaaaa doing thing A

2014/04/16 23:54:00,021 id=aaaaa doing thing B

2014/04/16 23:54:00,046 id=aaaaa doing thing C

Is there a way to join these all up into a process flow, and calculate the latencies between each stage? I know I could use transaction to join them into a single event, where duration is calculated automatically as total time for the flow, but is not sufficient for situations where you need durations/latencies between each event.

An example of where this would be useful, is for flow analysis of a software system, and identifying which parts of the flow are causing delays.

The solution would ideally be able to run over large datasets, flow can be monitored statistically for trend/historical analysis. Ie. not just for tracing the flow of a single ID.

Bonus question! What if the same ID is not present in all events (although can be correlated via some events):

2014/04/16 23:54:00,000 id=aaaaa doing thing A

2014/04/16 23:54:00,021 id=aaaaa Xid=bbbbb doing thing B

2014/04/16 23:54:00,046 Xid=bbbbb doing thing C

How can you do the common field correlation over large data sets?

Tags (3)
0 Karma

Splunk Employee
Splunk Employee

Hi Glenn,

There are two approaches for getting the information joined together, using stats or transaction, assuming you can extract your "thing A" into a field called step :

  • sourcetype=process | stats values(step) as step values(_time) as times by id
  • sourcetype=process | eval time2=_time | transaction Xid id mvlist="step,time2"

If you can't extract a field, an alternative is to define eventtypes which you can use as a step description or numbering.

Subsequently we can start looking analysis, and one common use case is to look at flow paths or steps taken to determine if there are stuck IDs, and a neat way of doing so is to use mvcombine:

  • | mvcombine step | stats count by step

Additionally you want to compute step duration, which is hard to do in pure SPL, but can be achieved using a custom search script helper, like stepstats:

  • From the transaction example above | stepstats step,time2

which will compute the durations.

If you need the stepstats command, drop me an email on dart@splunk.com

Additionally, if you're looking to make different expected duration steps comparable, Apdex may be of interest.