Splunk Search

Getting totals of child processes at the parent

kiwiglen
Observer

I have an index with a list of transactions, the transactions in the system start as 1 process with a transaction number (TRANNO) and that transaction can start a number of sub-tasks, each sub task will have its own transaction number (TRANNO) but will also have its parent transaction number (PHTRANNO).  All the tasks (parents and children) have an amount of CPU consumed (USRCPUT_MICROSEC).

All tasks have an id field (USER) which tells us what type of task it was.

I want to be able to create a report listing all the types of tasks (USER) and the average and max CPU consumed (USRCPUT_MICROSEC).

I've managed to create the report with the sum of the parents CPU but most of the CPU is consumed by the children. 

Any suggestions on how to do this?  I've been searching and trying things for hours and I'm not getting anywhere.

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @kiwiglen 

To address your requirement of aggregating CPU metrics across all tasks (parents and children) by USER type, ensure your SPL include all events without filtering on parent/child relationships. Use stats directly on USER to calculate avg, max and sum - I wasnt sure if you wanted avg/max or sum?:

index=your_index | stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by USER
 

This aggregates CPU for all tasks (parents + children) by their USER type. If you need hierarchical sums (e.g., parent CPU + child CPU per root transaction), clarify your data structure, as recursive aggregation would require a different approach (e.g., transaction grouping or hierarchical data modeling).

If you do need per transaction level instead of USER then the following might work:

index=your_index 
| eval txID = COALESCE(PHTRANNO,TRANNO) 
| stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by txID

 Obviously this depends a lot on what the raw data actually looks like, let me know if this is close to what you need and maybe provide some sample anonymised data if possible?

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your requirements and the description of your data are a bit unclear.

If you have your data as a tree structure represented only as separate vertices without a full path to the root... you can't do it reasonably using SPL alone. It's similar to the anti-patterns in SQL - since you don't have recursion at your disposal, you can't reconstruct an arbitrarily-long path just from parent-child node pairs.

So be a bit more verbose about your problem.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

How many levels deep can the parent/child relationship be?

Anyway you can do this with a couple of lines of SPL, see this example which creates some dummy data and then the final two lines will create the sum of time per USER/TRANNO

| makeresults count=4
| eval TRANNO=random()
| eval USRCPUT_MICROSEC=random() % 10000
| streamstats c as USER
| eval USER="TASK ".USER
| appendpipe [
  | eval children=mvrange(1,5,1)
  | mvexpand children
  | rename TRANNO as PHTRANNO
  | eval USRCPUT_MICROSEC=random() % 1000000
]
``` The above is just creating some example data ```
| eval TRANNO=coalesce(TRANNO, PHTRANNO)
| stats sum(USRCPUT_MICROSEC) as USRCPUT_MICROSEC by USER TRANNO

If this is not what your data looks like, please post some anonymised examples of your data.

0 Karma

kiwiglen
Observer

Hi everybody thanks for the replies.  Sorry I didn't provide enough detail in the original post, I'm trying to workout the data structure as we have lots of data, I've pull some data out that shows 2 examples.

The first example starts with USER=GPDCFC26, all the rows above that are sub-tasks or sub-sub-tasks.  And the second example is USER=GOTLIS12 and the 4 rows above are again the sub and sub-sub tasks.

What I want is by user (which is at the task level only) get a couple of bits of information, the average and max total CPU time (USRCPUT_MICROSEC), the average number of sub tasks for each task (so 10 for GPDCFC26 and 4 for GOTLIS12), and a count of the number of time each task has executed (1 for both in this case).

STARTSTOPUSERJOBNAMETRANTRANNUMPHAPPLIDPHTRANPHTRANNOUSRCPUT_MICROSEC
2:10:30 p.m.2:10:30 p.m. APP3CSMI43853APP7QZ817032276
2:10:30 p.m.2:10:30 p.m. APP3CSMI43850APP7QZ817032264
2:10:30 p.m.2:10:30 p.m. APP3CSMI43848APP7QZ817032264
2:10:30 p.m.2:10:30 p.m. APP3CSMI43846APP7QZ817032274
2:10:30 p.m.2:10:30 p.m. APP3CSMI43845APP7QZ817032268
2:10:30 p.m.2:10:30 p.m. APP3CSMI43844APP7QZ817032271
2:10:30 p.m.2:10:30 p.m. APP3CSMI43857APP7QZ817032265
2:10:30 p.m.2:10:30 p.m. APP3CSMI43856APP7QZ817032272
2:10:30 p.m.2:10:30 p.m. APP5CSMI20634APP7QZ81703228860
2:10:30 p.m.2:10:30 p.m. APP7QZ8170322APP3QZ814383616043
2:10:30 p.m.2:10:30 p.m.GPDCFC26APP3QZ8143836  0897
2:10:17 p.m.2:10:17 p.m. APP3CSMI41839APP5QZ611555151
2:10:17 p.m.2:10:17 p.m. APP3CSMI41838APP5QZ611555164
2:10:17 p.m.2:10:17 p.m. APP3CSMI41837APP5QZ611555179
2:10:17 p.m.2:10:17 p.m. APP5QZ6115551APP3QZ61418355232
2:10:17 p.m.2:10:17 p.m.GOTLIS12APP3QZ6141835  0778
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. So you have only edges of your tree as I suspected. @bowesmana 's solution will probably work (haven't checked but that's what I'd expect from experience ;-)) but be aware that with such data you're limited to fixed level of "nesting" in your data. Since SPL cannot do recursion, you cannot "unpack" any arbitrary level of sub-sub-sub...tasks by a general solotion. You can write a search for two levels, you could extend it to three or four but it'll always be a fixed level.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

As @PickleRick says, it should work on the data you gave us, but one assumption it's using is that all your example rows show that the PHTRAN field always referring to the ROOT transaction QZ81 (for USER=GPDCFC26), despite your example showing 2 sub-levels of task, i.e. USER=GPDCFC26 calls APP7, with TRANNUM 70322 and APP7 is the parent for all its own subtasks through APP3 and APP5.

IFF this is the case, then it will probably handle all levels you have, but as he also states, you can only program recursion to a fixed level, so if you only have the parent/child transaction numbers, then it's more difficult.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If I understand you correctly, this example should give you what you want. The first makeresults section is crafting your data, so you actually need from the eval statement following the data setup.

| makeresults format=csv data="START,STOP,USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43853,APP7,QZ81,70322,76
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43850,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43848,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43846,APP7,QZ81,70322,74
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43845,APP7,QZ81,70322,68
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43844,APP7,QZ81,70322,71
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43857,APP7,QZ81,70322,65
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43856,APP7,QZ81,70322,72
2:10:30 p.m.,2:10:30 p.m., ,APP5,CSMI,20634,APP7,QZ81,70322,8860
2:10:30 p.m.,2:10:30 p.m., ,APP7,QZ81,70322,APP3,QZ81,43836,16043
2:10:30 p.m.,2:10:30 p.m.,GPDCFC26,APP3,QZ81,43836, , ,0,897
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41839,APP5,QZ61,15551,51
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41838,APP5,QZ61,15551,64
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41837,APP5,QZ61,15551,79
2:10:17 p.m.,2:10:17 p.m., ,APP5,QZ61,15551,APP3,QZ61,41835,5232
2:10:17 p.m.,2:10:17 p.m.,GOTLIS12,APP3,QZ61,41835, , ,0,778"
``` In the task case, PHTRAN is empty, so this will copy the TRAN to PHTRAN giving you correlation ```
| eval PHTRAN=coalesce(PHTRAN,TRAN)

``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count as subTasks values(USER) as USER by PHTRAN 

``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC max(subTasks) as subTasks by USER

``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval subTasks=subTasks-1, avg_subTasks=subTasks/Executions

 

0 Karma

kiwiglen
Observer

Thanks that is really close, but the problem is I can't use PHTRAN to group them, that ID appears heaps in data, the grouping needs to be based on TRANNUM, so  43836 is the initial transaction, it starts 70322 which then runs the first 9.  This could be 4-5 layers deep. 

How do I handle that and get totals at the highest level?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please provide new sample data that better represents your actual situation so we might be better able to assist you..

0 Karma

kiwiglen
Observer

Sorry I should have included updated data.  Here is a sample

USERJOBNAMETRANTRANNUMPHAPPLIDPHTRANPHTRANNOUSRCPUT_MICROSEC
 APP3CSMI43856APP7QZ817032272
 APP5CSMI20634APP7QZ81703228860
 APP7QZ8170322APP3QZ814383616043
GPDCFC26APP3QZ8143836  0897
 APP3CSMI41839APP5QZ611555151
 APP3CSMI41838APP5QZ611555164
 APP3CSMI41837APP5QZ611555179
 APP5QZ6115551APP3QZ61418355232
GOTLIS12APP3QZ6141835  0778
 APP5QZ6112APP3QZ6115232
GOTLIS12APP3QZ611  0778
 APP5CSMI111APP7QZ811108860
 APP7QZ81110APP3QZ8110016043
ABCDEFAPP3QZ81100  0897
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@kiwiglen 

As has been said by @PickleRick , there is no recursion here, so this example handles up to 5 levels of subtask.

| makeresults format=csv data="USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
 ,APP3,CSMI,43856,APP7,QZ81,70322,72
 ,APP5,CSMI,20634,APP7,QZ81,70322,8860
 ,APP7,QZ81,70322,APP3,QZ81,43836,16043
GPDCFC26,APP3,QZ81,43836, , ,0,897
 ,APP3,CSMI,41839,APP5,QZ61,15551,51
 ,APP3,CSMI,41838,APP5,QZ61,15551,64
 ,APP3,CSMI,41837,APP5,QZ61,15551,79
 ,APP5,QZ61,15551,APP3,QZ61,41835,5232
GOTLIS12,APP3,QZ61,41835, , ,0,778
 ,APP5,QZ61,12,APP3,QZ61,1,5232
GOTLIS12,APP3,QZ61,1, , ,0,778
 ,APP5,CSMI,111,APP7,QZ81,110,8860
 ,APP7,QZ81,110,APP3,QZ81,100,16043
ABCDEF,APP3,QZ81,100, , ,0,897"
| fields USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC

``` Now initialise level 0 numbers ```
| eval level=if(PHTRANNO=0, 0, null()), root_trannum=if(PHTRANNO=0, TRANNUM, null())

``` This logic handles 5 levels of "recursion" - as discussed earlier, it's not true recursion as you have to specify the operations for the max level count you need.
    The logic works by cascading the USER and root TRANNUM down the events for the related subtasks. It performs the following actions
    
    - Create a parent id field containing TRANNUM, root TRANNUM USER and level for the specific level wanted - in this case level 0 is PHTRANNO=0 in the IF test 
    - Collect all the values of those ids across all events
    - Find the PHTRANNO value of the event in the list of parents
    - Extract the user and root TRANNUM from the result if found 
```


``` Get level 1 ids ```
| eval parent_id=if(PHTRANNO=0, TRANNUM.":".root_trannum.":".USER.":".1, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 2 ids ```
| eval parent_id=if(level=1, TRANNUM.":".root_trannum.":".USER.":".2, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 3 ids ```
| eval parent_id=if(level=2, TRANNUM.":".root_trannum.":".USER.":".3, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 4 ids ```
| eval parent_id=if(level=3, TRANNUM.":".root_trannum.":".USER.":".4, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 5 ids ```
| eval parent_id=if(level=4, TRANNUM.":".root_trannum.":".USER.":".5, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

| fields - root_user parents parent_id data


``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count(eval(PHTRANNO!=0)) as subTasks by USER root_trannum

``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC sum(eval(if(PHTRANNO=0,subTasks, 0))) as subTasks by USER

``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval avg_subTasks=subTasks/Executions

Hopefully the comments help explain what's going on - however, without knowing really what you want from averages and totals, you may need to tweak, however, this is an EXPENSIVE search - so if you're dealing with a large data set it may be slow.

If you have questions about the implementation just ask.

Note, as with Splunk things, there may be a way to improve this, but with no correlation information other than the TRANNUM, it's tricky.

0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...