Re: Getting totals of child processes at the paren...

kiwiglen · ‎03-19-2025

I have an index with a list of transactions, the transactions in the system start as 1 process with a transaction number (TRANNO) and that transaction can start a number of sub-tasks, each sub task will have its own transaction number (TRANNO) but will also have its parent transaction number (PHTRANNO). All the tasks (parents and children) have an amount of CPU consumed (USRCPUT_MICROSEC).

All tasks have an id field (USER) which tells us what type of task it was.

I want to be able to create a report listing all the types of tasks (USER) and the average and max CPU consumed (USRCPUT_MICROSEC).

I've managed to create the report with the sum of the parents CPU but most of the CPU is consumed by the children.

Any suggestions on how to do this? I've been searching and trying things for hours and I'm not getting anywhere.

livehybrid · ‎03-20-2025

Hi @kiwiglen

To address your requirement of aggregating CPU metrics across all tasks (parents and children) by USER type, ensure your SPL include all events without filtering on parent/child relationships. Use stats directly on USER to calculate avg, max and sum - I wasnt sure if you wanted avg/max or sum?:

index=your_index | stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by USER

This aggregates CPU for all tasks (parents + children) by their USER type. If you need hierarchical sums (e.g., parent CPU + child CPU per root transaction), clarify your data structure, as recursive aggregation would require a different approach (e.g., transaction grouping or hierarchical data modeling).

If you do need per transaction level instead of USER then the following might work:

index=your_index 
| eval txID = COALESCE(PHTRANNO,TRANNO) 
| stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by txID

Obviously this depends a lot on what the raw data actually looks like, let me know if this is close to what you need and maybe provide some sample anonymised data if possible?

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

PickleRick · ‎03-20-2025

Your requirements and the description of your data are a bit unclear.

If you have your data as a tree structure represented only as separate vertices without a full path to the root... you can't do it reasonably using SPL alone. It's similar to the anti-patterns in SQL - since you don't have recursion at your disposal, you can't reconstruct an arbitrarily-long path just from parent-child node pairs.

So be a bit more verbose about your problem.

bowesmana · ‎03-19-2025

How many levels deep can the parent/child relationship be?

Anyway you can do this with a couple of lines of SPL, see this example which creates some dummy data and then the final two lines will create the sum of time per USER/TRANNO

| makeresults count=4
| eval TRANNO=random()
| eval USRCPUT_MICROSEC=random() % 10000
| streamstats c as USER
| eval USER="TASK ".USER
| appendpipe [
  | eval children=mvrange(1,5,1)
  | mvexpand children
  | rename TRANNO as PHTRANNO
  | eval USRCPUT_MICROSEC=random() % 1000000
]
``` The above is just creating some example data ```
| eval TRANNO=coalesce(TRANNO, PHTRANNO)
| stats sum(USRCPUT_MICROSEC) as USRCPUT_MICROSEC by USER TRANNO

If this is not what your data looks like, please post some anonymised examples of your data.

kiwiglen · ‎03-20-2025

Hi everybody thanks for the replies. Sorry I didn't provide enough detail in the original post, I'm trying to workout the data structure as we have lots of data, I've pull some data out that shows 2 examples.

The first example starts with USER=GPDCFC26, all the rows above that are sub-tasks or sub-sub-tasks. And the second example is USER=GOTLIS12 and the 4 rows above are again the sub and sub-sub tasks.

What I want is by user (which is at the task level only) get a couple of bits of information, the average and max total CPU time (USRCPUT_MICROSEC), the average number of sub tasks for each task (so 10 for GPDCFC26 and 4 for GOTLIS12), and a count of the number of time each task has executed (1 for both in this case).

START	STOP	USER	JOBNAME	TRAN	TRANNUM	PHAPPLID	PHTRAN	PHTRANNO	USRCPUT_MICROSEC
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43853	APP7	QZ81	70322	76
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43850	APP7	QZ81	70322	64
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43848	APP7	QZ81	70322	64
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43846	APP7	QZ81	70322	74
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43845	APP7	QZ81	70322	68
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43844	APP7	QZ81	70322	71
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43857	APP7	QZ81	70322	65
2:10:30 p.m.	2:10:30 p.m.		APP3	CSMI	43856	APP7	QZ81	70322	72
2:10:30 p.m.	2:10:30 p.m.		APP5	CSMI	20634	APP7	QZ81	70322	8860
2:10:30 p.m.	2:10:30 p.m.		APP7	QZ81	70322	APP3	QZ81	43836	16043
2:10:30 p.m.	2:10:30 p.m.	GPDCFC26	APP3	QZ81	43836			0	897
2:10:17 p.m.	2:10:17 p.m.		APP3	CSMI	41839	APP5	QZ61	15551	51
2:10:17 p.m.	2:10:17 p.m.		APP3	CSMI	41838	APP5	QZ61	15551	64
2:10:17 p.m.	2:10:17 p.m.		APP3	CSMI	41837	APP5	QZ61	15551	79
2:10:17 p.m.	2:10:17 p.m.		APP5	QZ61	15551	APP3	QZ61	41835	5232
2:10:17 p.m.	2:10:17 p.m.	GOTLIS12	APP3	QZ61	41835			0	778

PickleRick · ‎03-20-2025

Ok. So you have only edges of your tree as I suspected. @bowesmana 's solution will probably work (haven't checked but that's what I'd expect from experience ;-)) but be aware that with such data you're limited to fixed level of "nesting" in your data. Since SPL cannot do recursion, you cannot "unpack" any arbitrary level of sub-sub-sub...tasks by a general solotion. You can write a search for two levels, you could extend it to three or four but it'll always be a fixed level.

bowesmana · ‎03-20-2025

As @PickleRick says, it should work on the data you gave us, but one assumption it's using is that all your example rows show that the PHTRAN field always referring to the ROOT transaction QZ81 (for USER=GPDCFC26), despite your example showing 2 sub-levels of task, i.e. USER=GPDCFC26 calls APP7, with TRANNUM 70322 and APP7 is the parent for all its own subtasks through APP3 and APP5.

IFF this is the case, then it will probably handle all levels you have, but as he also states, you can only program recursion to a fixed level, so if you only have the parent/child transaction numbers, then it's more difficult.

bowesmana · ‎03-20-2025

If I understand you correctly, this example should give you what you want. The first makeresults section is crafting your data, so you actually need from the eval statement following the data setup.

| makeresults format=csv data="START,STOP,USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43853,APP7,QZ81,70322,76
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43850,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43848,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43846,APP7,QZ81,70322,74
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43845,APP7,QZ81,70322,68
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43844,APP7,QZ81,70322,71
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43857,APP7,QZ81,70322,65
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43856,APP7,QZ81,70322,72
2:10:30 p.m.,2:10:30 p.m., ,APP5,CSMI,20634,APP7,QZ81,70322,8860
2:10:30 p.m.,2:10:30 p.m., ,APP7,QZ81,70322,APP3,QZ81,43836,16043
2:10:30 p.m.,2:10:30 p.m.,GPDCFC26,APP3,QZ81,43836, , ,0,897
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41839,APP5,QZ61,15551,51
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41838,APP5,QZ61,15551,64
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41837,APP5,QZ61,15551,79
2:10:17 p.m.,2:10:17 p.m., ,APP5,QZ61,15551,APP3,QZ61,41835,5232
2:10:17 p.m.,2:10:17 p.m.,GOTLIS12,APP3,QZ61,41835, , ,0,778"
``` In the task case, PHTRAN is empty, so this will copy the TRAN to PHTRAN giving you correlation ```
| eval PHTRAN=coalesce(PHTRAN,TRAN)

``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count as subTasks values(USER) as USER by PHTRAN 

``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC max(subTasks) as subTasks by USER

``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval subTasks=subTasks-1, avg_subTasks=subTasks/Executions

kiwiglen · ‎03-22-2025

Thanks that is really close, but the problem is I can't use PHTRAN to group them, that ID appears heaps in data, the grouping needs to be based on TRANNUM, so 43836 is the initial transaction, it starts 70322 which then runs the first 9. This could be 4-5 layers deep.

How do I handle that and get totals at the highest level?

ITWhisperer · ‎03-22-2025

Please provide new sample data that better represents your actual situation so we might be better able to assist you..

kiwiglen · ‎03-22-2025

Sorry I should have included updated data. Here is a sample

USER	JOBNAME	TRAN	TRANNUM	PHAPPLID	PHTRAN	PHTRANNO	USRCPUT_MICROSEC
	APP3	CSMI	43856	APP7	QZ81	70322	72
	APP5	CSMI	20634	APP7	QZ81	70322	8860
	APP7	QZ81	70322	APP3	QZ81	43836	16043
GPDCFC26	APP3	QZ81	43836			0	897
	APP3	CSMI	41839	APP5	QZ61	15551	51
	APP3	CSMI	41838	APP5	QZ61	15551	64
	APP3	CSMI	41837	APP5	QZ61	15551	79
	APP5	QZ61	15551	APP3	QZ61	41835	5232
GOTLIS12	APP3	QZ61	41835			0	778
	APP5	QZ61	12	APP3	QZ61	1	5232
GOTLIS12	APP3	QZ61	1			0	778
	APP5	CSMI	111	APP7	QZ81	110	8860
	APP7	QZ81	110	APP3	QZ81	100	16043
ABCDEF	APP3	QZ81	100			0	897

bowesmana · ‎03-23-2025

@kiwiglen

As has been said by @PickleRick , there is no recursion here, so this example handles up to 5 levels of subtask.

| makeresults format=csv data="USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
 ,APP3,CSMI,43856,APP7,QZ81,70322,72
 ,APP5,CSMI,20634,APP7,QZ81,70322,8860
 ,APP7,QZ81,70322,APP3,QZ81,43836,16043
GPDCFC26,APP3,QZ81,43836, , ,0,897
 ,APP3,CSMI,41839,APP5,QZ61,15551,51
 ,APP3,CSMI,41838,APP5,QZ61,15551,64
 ,APP3,CSMI,41837,APP5,QZ61,15551,79
 ,APP5,QZ61,15551,APP3,QZ61,41835,5232
GOTLIS12,APP3,QZ61,41835, , ,0,778
 ,APP5,QZ61,12,APP3,QZ61,1,5232
GOTLIS12,APP3,QZ61,1, , ,0,778
 ,APP5,CSMI,111,APP7,QZ81,110,8860
 ,APP7,QZ81,110,APP3,QZ81,100,16043
ABCDEF,APP3,QZ81,100, , ,0,897"
| fields USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC

``` Now initialise level 0 numbers ```
| eval level=if(PHTRANNO=0, 0, null()), root_trannum=if(PHTRANNO=0, TRANNUM, null())

``` This logic handles 5 levels of "recursion" - as discussed earlier, it's not true recursion as you have to specify the operations for the max level count you need.
    The logic works by cascading the USER and root TRANNUM down the events for the related subtasks. It performs the following actions
    
    - Create a parent id field containing TRANNUM, root TRANNUM USER and level for the specific level wanted - in this case level 0 is PHTRANNO=0 in the IF test 
    - Collect all the values of those ids across all events
    - Find the PHTRANNO value of the event in the list of parents
    - Extract the user and root TRANNUM from the result if found 
```


``` Get level 1 ids ```
| eval parent_id=if(PHTRANNO=0, TRANNUM.":".root_trannum.":".USER.":".1, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 2 ids ```
| eval parent_id=if(level=1, TRANNUM.":".root_trannum.":".USER.":".2, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 3 ids ```
| eval parent_id=if(level=2, TRANNUM.":".root_trannum.":".USER.":".3, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 4 ids ```
| eval parent_id=if(level=3, TRANNUM.":".root_trannum.":".USER.":".4, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

``` Get level 5 ids ```
| eval parent_id=if(level=4, TRANNUM.":".root_trannum.":".USER.":".5, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)

| fields - root_user parents parent_id data


``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count(eval(PHTRANNO!=0)) as subTasks by USER root_trannum

``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC sum(eval(if(PHTRANNO=0,subTasks, 0))) as subTasks by USER

``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval avg_subTasks=subTasks/Executions

Hopefully the comments help explain what's going on - however, without knowing really what you want from averages and totals, you may need to tweak, however, this is an EXPENSIVE search - so if you're dealing with a large data set it may be slow.

If you have questions about the implementation just ask.

Note, as with Splunk things, there may be a way to improve this, but with no correlation information other than the TRANNUM, it's tricky.

Getting totals of child processes at the parent

other

.conf25 Community Recap

Splunk App Developers | .conf25 Recap & What’s Next

Congratulations to the 2025-2026 SplunkTrust!

Are you a member of the Splunk Community?

Getting totals of child processes at the parent

other

.conf25 Community Recap

Splunk App Developers | .conf25 Recap & What’s Next

Congratulations to the 2025-2026 SplunkTrust!