Splunk Search

How to correlate nested parent child jobs?

AKG11
Path Finder

Hi,

we have to monitor some jobs in which One Job could have multiple sub task.  It could be nested dependency as well.  One task depended on other and that other dependent on some other....

I am looking to correlate these dependency and want to see how much time job took end to end.

In below example. 
1st task dependent on 2nd, 2nd dependent on 3rd, 4th is End of task.  

AKG11_0-1680770961762.png

 

Labels (1)

woodcock
Esteemed Legend

There is going to be homework here but I will give you an analogous problem with solution that you can adapt to your data.  First, install The "CIM Toolkit" app (it has a TON of great stuff in it):
https://splunkbase.splunk.com/app/6243

Next, run this search:
| `CIM_Toolkit_Generate_Renames(Authentication)`

Then do Shift-Ctrl-E to Expand the macro into SPL.  Tear it apart and take what you need.

Here is why it will do what you need  It starts out with data like this:

title child1 objectName parentName

Authentication AuthenticationBaseEvent
AuthenticationDefault_AuthenticationAuthenticationBaseEvent
AuthenticationFailed_AuthenticationAuthenticationBaseEvent
AuthenticationInsecure_AuthenticationAuthenticationBaseEvent
AuthenticationPrivileged_AuthenticationAuthenticationBaseEvent
AuthenticationSuccessful_AuthenticationAuthenticationBaseEvent


And transforms it into a final output like this:
| rename Authentication.Successful_Authentication.* AS *
| rename Authentication.Privileged_Authentication.Successful_Privileged_Authentication.* AS *
| rename Authentication.Privileged_Authentication.Failed_Privileged_Authentication.* AS *
| rename Authentication.Privileged_Authentication.* AS *
| rename Authentication.Insecure_Authentication.* AS *
| rename Authentication.Failed_Authentication.* AS *
| rename Authentication.Default_Authentication.Successful_Default_Authentication.* AS *
| rename Authentication.Default_Authentication.Failed_Default_Authentication.* AS *
| rename Authentication.Default_Authentication.* AS *
| rename Authentication.* AS *

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't think there's an easy way to do that. It's one of the anti-patterns in data - to try to normalize everything. The similar case is when someone implements a tree structure (for example to have a filesystem-like objects) with simply a "pointer" to a parent in SQL database.

It requires a recursive processing to find the path to the root.

And while in some flavours of SQL you can make a strored procedure which will do that for you (it will still not be very effective mind you), Splunk has no way of "looping" over own results and "re-searching" based on those results.

The only - very ugly - "quasi-solution" would be to assume that your jobs cannot be nested more than N-levels and just do N iterations of mapping a job to its parent.

yuanliu
SplunkTrust
SplunkTrust

Good pointer.  In fact, I started with two levels (as the screenshot shows):

 

| eval u_key = if(isnull(key), parent_key, key)
| transaction u_key startswith="eval(isnull(key))" endswith="eval(isnull(parent_key) OR key == u_key)" keepevicted=true
| foreach parent_key mode=multivalue
    [eval uu_key = mvappend(uu_key, if(mvcount(parent_key) < 2, <<ITEM>>, if(key == <<ITEM>>, null(), <<ITEM>>)))]
| eval _time = if(u_key == uu_key, _time, _time + duration)
| rename duration as duration1
| transaction uu_key startswith="eval(key!=uu_key)" endswith="eval(key==uu_key)" keepevicted=true
| eval duration = sum(duration1) + duration

 

Then, I realize that each additional nest requires an additional explicit compaction.  Very ugly indeed.

Anyway, here is an emulated dataset to test the above:

_timekeyparent_key
2023-03-29 07:26:0067890 
2023-03-29 07:06:00 67890
2023-03-29 06:30:001234567890
2023-03-29 06:04:00 12345

Speaking of test dataset, the OP should always provide that in text.  Screenshot is very unhelpful.

The output would look like

_timeclosed_txnduration
duration1
eventcountfield_match_sum
key
parent_key
u_key
uu_key
2023-03-29 06:30:0004920
1200
1560
22
12345
67890
12345
67890
12345
67890
67890

tscroggins
Influencer

Where supported, common table expressions make trees manageable in relational databases. Graph databases make trees trivial. The problem isn't intractable in Splunk; it's just different.

PickleRick
SplunkTrust
SplunkTrust

CTEs are not your common SQL. 😉

And with your basic SQL it's very annoying to operate on tree-like structures in SQL. I had to do it in ASE 11.9.2 so I know what I'm talking about. Good thing it has stored procedures. MySQL at that time didnt 🙂

tscroggins
Influencer

I used CTEs in ad hoc T-SQL (and less often, CONNECT BY in Oracle) for critical path analysis. It worked very well on indexed identifier and dependency columns. Translating relational SQL solutions to streaming Splunk solutions made up the bulk of my early Splunk experience. I do much less of that now, although the background helps when working with clients on DB Connect solutions, particularly with "UPSERT" style audit tables and normalized, often undocumented third-party schemas.

PickleRick
SplunkTrust
SplunkTrust

As I said - sometimes it can be done, sometimes it cannot. CTEs are not part of the SQL-92, recursive queries were added in SQL:1999. Some DBs started implementing them relatively recently. But we're getting sidetracked here. 😉

With Splunk you have no recursion unless you try to do some very very very ugly (and hardly maintainable) stuff. So I'd say that the original requirement is not fulfillable by Splunk in its raw form.

tscroggins
Influencer

Hi,

If the start and finish times are in the search results and you only need the total duration, you can use the stats range function to calculate the duration in seconds:

...
| stats min(_time) as start_time max(_time) as finish_time range(_time) as duration
| fieldformat start_time=strftime(start_time, "%F %T")
| fieldformat finish_time=strftime(finish_time, "%F %T")
| fieldformat duration=tostring(duration, "duration")

If you need to know the critical/longest path through the job network, the solution is more involved.

AKG11
Path Finder
 

Thanks but this is not going to work.  Its not that simple.  we can't combine everything. Example is just subset there are multiple nested jobs. Also there is not end time.

tscroggins
Influencer

In that case, we'll need to see a larger sample dataset (deidentified if necessary). If neither the finish times nor the durations are known, what method have you been using to manually calculate the makespan / end-to-end duration?

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...