Hi,
we have to monitor some jobs in which One Job could have multiple sub task. It could be nested dependency as well. One task depended on other and that other dependent on some other....
I am looking to correlate these dependency and want to see how much time job took end to end.
In below example.
1st task dependent on 2nd, 2nd dependent on 3rd, 4th is End of task.
There is going to be homework here but I will give you an analogous problem with solution that you can adapt to your data. First, install The "CIM Toolkit" app (it has a TON of great stuff in it):
https://splunkbase.splunk.com/app/6243
Next, run this search:
| `CIM_Toolkit_Generate_Renames(Authentication)`
Then do Shift-Ctrl-E to Expand the macro into SPL. Tear it apart and take what you need.
Here is why it will do what you need It starts out with data like this:
title child1 objectName parentName
Authentication | Authentication | BaseEvent | |
Authentication | Default_Authentication | Authentication | BaseEvent |
Authentication | Failed_Authentication | Authentication | BaseEvent |
Authentication | Insecure_Authentication | Authentication | BaseEvent |
Authentication | Privileged_Authentication | Authentication | BaseEvent |
Authentication | Successful_Authentication | Authentication | BaseEvent |
And transforms it into a final output like this:
| rename Authentication.Successful_Authentication.* AS *
| rename Authentication.Privileged_Authentication.Successful_Privileged_Authentication.* AS *
| rename Authentication.Privileged_Authentication.Failed_Privileged_Authentication.* AS *
| rename Authentication.Privileged_Authentication.* AS *
| rename Authentication.Insecure_Authentication.* AS *
| rename Authentication.Failed_Authentication.* AS *
| rename Authentication.Default_Authentication.Successful_Default_Authentication.* AS *
| rename Authentication.Default_Authentication.Failed_Default_Authentication.* AS *
| rename Authentication.Default_Authentication.* AS *
| rename Authentication.* AS *
I don't think there's an easy way to do that. It's one of the anti-patterns in data - to try to normalize everything. The similar case is when someone implements a tree structure (for example to have a filesystem-like objects) with simply a "pointer" to a parent in SQL database.
It requires a recursive processing to find the path to the root.
And while in some flavours of SQL you can make a strored procedure which will do that for you (it will still not be very effective mind you), Splunk has no way of "looping" over own results and "re-searching" based on those results.
The only - very ugly - "quasi-solution" would be to assume that your jobs cannot be nested more than N-levels and just do N iterations of mapping a job to its parent.
Good pointer. In fact, I started with two levels (as the screenshot shows):
| eval u_key = if(isnull(key), parent_key, key)
| transaction u_key startswith="eval(isnull(key))" endswith="eval(isnull(parent_key) OR key == u_key)" keepevicted=true
| foreach parent_key mode=multivalue
[eval uu_key = mvappend(uu_key, if(mvcount(parent_key) < 2, <<ITEM>>, if(key == <<ITEM>>, null(), <<ITEM>>)))]
| eval _time = if(u_key == uu_key, _time, _time + duration)
| rename duration as duration1
| transaction uu_key startswith="eval(key!=uu_key)" endswith="eval(key==uu_key)" keepevicted=true
| eval duration = sum(duration1) + duration
Then, I realize that each additional nest requires an additional explicit compaction. Very ugly indeed.
Anyway, here is an emulated dataset to test the above:
_time | key | parent_key |
2023-03-29 07:26:00 | 67890 | |
2023-03-29 07:06:00 | 67890 | |
2023-03-29 06:30:00 | 12345 | 67890 |
2023-03-29 06:04:00 | 12345 |
Speaking of test dataset, the OP should always provide that in text. Screenshot is very unhelpful.
The output would look like
_time | closed_txn | duration | duration1 | eventcount | field_match_sum | key | parent_key | u_key | uu_key |
2023-03-29 06:30:00 | 0 | 4920 | 1200 1560 | 2 | 2 | 12345 67890 | 12345 67890 | 12345 67890 | 67890 |
Where supported, common table expressions make trees manageable in relational databases. Graph databases make trees trivial. The problem isn't intractable in Splunk; it's just different.
CTEs are not your common SQL. 😉
And with your basic SQL it's very annoying to operate on tree-like structures in SQL. I had to do it in ASE 11.9.2 so I know what I'm talking about. Good thing it has stored procedures. MySQL at that time didnt 🙂
I used CTEs in ad hoc T-SQL (and less often, CONNECT BY in Oracle) for critical path analysis. It worked very well on indexed identifier and dependency columns. Translating relational SQL solutions to streaming Splunk solutions made up the bulk of my early Splunk experience. I do much less of that now, although the background helps when working with clients on DB Connect solutions, particularly with "UPSERT" style audit tables and normalized, often undocumented third-party schemas.
As I said - sometimes it can be done, sometimes it cannot. CTEs are not part of the SQL-92, recursive queries were added in SQL:1999. Some DBs started implementing them relatively recently. But we're getting sidetracked here. 😉
With Splunk you have no recursion unless you try to do some very very very ugly (and hardly maintainable) stuff. So I'd say that the original requirement is not fulfillable by Splunk in its raw form.
Hi,
If the start and finish times are in the search results and you only need the total duration, you can use the stats range function to calculate the duration in seconds:
...
| stats min(_time) as start_time max(_time) as finish_time range(_time) as duration
| fieldformat start_time=strftime(start_time, "%F %T")
| fieldformat finish_time=strftime(finish_time, "%F %T")
| fieldformat duration=tostring(duration, "duration")
If you need to know the critical/longest path through the job network, the solution is more involved.
Thanks but this is not going to work. Its not that simple. we can't combine everything. Example is just subset there are multiple nested jobs. Also there is not end time.
In that case, we'll need to see a larger sample dataset (deidentified if necessary). If neither the finish times nor the durations are known, what method have you been using to manually calculate the makespan / end-to-end duration?