I have an index with a list of transactions, the transactions in the system start as 1 process with a transaction number (TRANNO) and that transaction can start a number of sub-tasks, each sub task will have its own transaction number (TRANNO) but will also have its parent transaction number (PHTRANNO). All the tasks (parents and children) have an amount of CPU consumed (USRCPUT_MICROSEC).
All tasks have an id field (USER) which tells us what type of task it was.
I want to be able to create a report listing all the types of tasks (USER) and the average and max CPU consumed (USRCPUT_MICROSEC).
I've managed to create the report with the sum of the parents CPU but most of the CPU is consumed by the children.
Any suggestions on how to do this? I've been searching and trying things for hours and I'm not getting anywhere.
Hi @kiwiglen
To address your requirement of aggregating CPU metrics across all tasks (parents and children) by USER type, ensure your SPL include all events without filtering on parent/child relationships. Use stats directly on USER to calculate avg, max and sum - I wasnt sure if you wanted avg/max or sum?:
index=your_index | stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by USER
This aggregates CPU for all tasks (parents + children) by their USER type. If you need hierarchical sums (e.g., parent CPU + child CPU per root transaction), clarify your data structure, as recursive aggregation would require a different approach (e.g., transaction grouping or hierarchical data modeling).
If you do need per transaction level instead of USER then the following might work:
index=your_index
| eval txID = COALESCE(PHTRANNO,TRANNO)
| stats avg(USRCPUT_MICROSEC) as avg_cpu, sum(USRCPUT_MICROSEC) as total_cpu max(USRCPUT_MICROSEC) as max_cpu by txID
Obviously this depends a lot on what the raw data actually looks like, let me know if this is close to what you need and maybe provide some sample anonymised data if possible?
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Your requirements and the description of your data are a bit unclear.
If you have your data as a tree structure represented only as separate vertices without a full path to the root... you can't do it reasonably using SPL alone. It's similar to the anti-patterns in SQL - since you don't have recursion at your disposal, you can't reconstruct an arbitrarily-long path just from parent-child node pairs.
So be a bit more verbose about your problem.
How many levels deep can the parent/child relationship be?
Anyway you can do this with a couple of lines of SPL, see this example which creates some dummy data and then the final two lines will create the sum of time per USER/TRANNO
| makeresults count=4
| eval TRANNO=random()
| eval USRCPUT_MICROSEC=random() % 10000
| streamstats c as USER
| eval USER="TASK ".USER
| appendpipe [
| eval children=mvrange(1,5,1)
| mvexpand children
| rename TRANNO as PHTRANNO
| eval USRCPUT_MICROSEC=random() % 1000000
]
``` The above is just creating some example data ```
| eval TRANNO=coalesce(TRANNO, PHTRANNO)
| stats sum(USRCPUT_MICROSEC) as USRCPUT_MICROSEC by USER TRANNO
If this is not what your data looks like, please post some anonymised examples of your data.
Hi everybody thanks for the replies. Sorry I didn't provide enough detail in the original post, I'm trying to workout the data structure as we have lots of data, I've pull some data out that shows 2 examples.
The first example starts with USER=GPDCFC26, all the rows above that are sub-tasks or sub-sub-tasks. And the second example is USER=GOTLIS12 and the 4 rows above are again the sub and sub-sub tasks.
What I want is by user (which is at the task level only) get a couple of bits of information, the average and max total CPU time (USRCPUT_MICROSEC), the average number of sub tasks for each task (so 10 for GPDCFC26 and 4 for GOTLIS12), and a count of the number of time each task has executed (1 for both in this case).
START | STOP | USER | JOBNAME | TRAN | TRANNUM | PHAPPLID | PHTRAN | PHTRANNO | USRCPUT_MICROSEC |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43853 | APP7 | QZ81 | 70322 | 76 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43850 | APP7 | QZ81 | 70322 | 64 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43848 | APP7 | QZ81 | 70322 | 64 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43846 | APP7 | QZ81 | 70322 | 74 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43845 | APP7 | QZ81 | 70322 | 68 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43844 | APP7 | QZ81 | 70322 | 71 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43857 | APP7 | QZ81 | 70322 | 65 | |
2:10:30 p.m. | 2:10:30 p.m. | APP3 | CSMI | 43856 | APP7 | QZ81 | 70322 | 72 | |
2:10:30 p.m. | 2:10:30 p.m. | APP5 | CSMI | 20634 | APP7 | QZ81 | 70322 | 8860 | |
2:10:30 p.m. | 2:10:30 p.m. | APP7 | QZ81 | 70322 | APP3 | QZ81 | 43836 | 16043 | |
2:10:30 p.m. | 2:10:30 p.m. | GPDCFC26 | APP3 | QZ81 | 43836 | 0 | 897 | ||
2:10:17 p.m. | 2:10:17 p.m. | APP3 | CSMI | 41839 | APP5 | QZ61 | 15551 | 51 | |
2:10:17 p.m. | 2:10:17 p.m. | APP3 | CSMI | 41838 | APP5 | QZ61 | 15551 | 64 | |
2:10:17 p.m. | 2:10:17 p.m. | APP3 | CSMI | 41837 | APP5 | QZ61 | 15551 | 79 | |
2:10:17 p.m. | 2:10:17 p.m. | APP5 | QZ61 | 15551 | APP3 | QZ61 | 41835 | 5232 | |
2:10:17 p.m. | 2:10:17 p.m. | GOTLIS12 | APP3 | QZ61 | 41835 | 0 | 778 |
Ok. So you have only edges of your tree as I suspected. @bowesmana 's solution will probably work (haven't checked but that's what I'd expect from experience ;-)) but be aware that with such data you're limited to fixed level of "nesting" in your data. Since SPL cannot do recursion, you cannot "unpack" any arbitrary level of sub-sub-sub...tasks by a general solotion. You can write a search for two levels, you could extend it to three or four but it'll always be a fixed level.
As @PickleRick says, it should work on the data you gave us, but one assumption it's using is that all your example rows show that the PHTRAN field always referring to the ROOT transaction QZ81 (for USER=GPDCFC26), despite your example showing 2 sub-levels of task, i.e. USER=GPDCFC26 calls APP7, with TRANNUM 70322 and APP7 is the parent for all its own subtasks through APP3 and APP5.
IFF this is the case, then it will probably handle all levels you have, but as he also states, you can only program recursion to a fixed level, so if you only have the parent/child transaction numbers, then it's more difficult.
If I understand you correctly, this example should give you what you want. The first makeresults section is crafting your data, so you actually need from the eval statement following the data setup.
| makeresults format=csv data="START,STOP,USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43853,APP7,QZ81,70322,76
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43850,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43848,APP7,QZ81,70322,64
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43846,APP7,QZ81,70322,74
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43845,APP7,QZ81,70322,68
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43844,APP7,QZ81,70322,71
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43857,APP7,QZ81,70322,65
2:10:30 p.m.,2:10:30 p.m., ,APP3,CSMI,43856,APP7,QZ81,70322,72
2:10:30 p.m.,2:10:30 p.m., ,APP5,CSMI,20634,APP7,QZ81,70322,8860
2:10:30 p.m.,2:10:30 p.m., ,APP7,QZ81,70322,APP3,QZ81,43836,16043
2:10:30 p.m.,2:10:30 p.m.,GPDCFC26,APP3,QZ81,43836, , ,0,897
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41839,APP5,QZ61,15551,51
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41838,APP5,QZ61,15551,64
2:10:17 p.m.,2:10:17 p.m., ,APP3,CSMI,41837,APP5,QZ61,15551,79
2:10:17 p.m.,2:10:17 p.m., ,APP5,QZ61,15551,APP3,QZ61,41835,5232
2:10:17 p.m.,2:10:17 p.m.,GOTLIS12,APP3,QZ61,41835, , ,0,778"
``` In the task case, PHTRAN is empty, so this will copy the TRAN to PHTRAN giving you correlation ```
| eval PHTRAN=coalesce(PHTRAN,TRAN)
``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count as subTasks values(USER) as USER by PHTRAN
``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC max(subTasks) as subTasks by USER
``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval subTasks=subTasks-1, avg_subTasks=subTasks/Executions
Thanks that is really close, but the problem is I can't use PHTRAN to group them, that ID appears heaps in data, the grouping needs to be based on TRANNUM, so 43836 is the initial transaction, it starts 70322 which then runs the first 9. This could be 4-5 layers deep.
How do I handle that and get totals at the highest level?
Please provide new sample data that better represents your actual situation so we might be better able to assist you..
Sorry I should have included updated data. Here is a sample
USER | JOBNAME | TRAN | TRANNUM | PHAPPLID | PHTRAN | PHTRANNO | USRCPUT_MICROSEC |
APP3 | CSMI | 43856 | APP7 | QZ81 | 70322 | 72 | |
APP5 | CSMI | 20634 | APP7 | QZ81 | 70322 | 8860 | |
APP7 | QZ81 | 70322 | APP3 | QZ81 | 43836 | 16043 | |
GPDCFC26 | APP3 | QZ81 | 43836 | 0 | 897 | ||
APP3 | CSMI | 41839 | APP5 | QZ61 | 15551 | 51 | |
APP3 | CSMI | 41838 | APP5 | QZ61 | 15551 | 64 | |
APP3 | CSMI | 41837 | APP5 | QZ61 | 15551 | 79 | |
APP5 | QZ61 | 15551 | APP3 | QZ61 | 41835 | 5232 | |
GOTLIS12 | APP3 | QZ61 | 41835 | 0 | 778 | ||
APP5 | QZ61 | 12 | APP3 | QZ61 | 1 | 5232 | |
GOTLIS12 | APP3 | QZ61 | 1 | 0 | 778 | ||
APP5 | CSMI | 111 | APP7 | QZ81 | 110 | 8860 | |
APP7 | QZ81 | 110 | APP3 | QZ81 | 100 | 16043 | |
ABCDEF | APP3 | QZ81 | 100 | 0 | 897 |
As has been said by @PickleRick , there is no recursion here, so this example handles up to 5 levels of subtask.
| makeresults format=csv data="USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
,APP3,CSMI,43856,APP7,QZ81,70322,72
,APP5,CSMI,20634,APP7,QZ81,70322,8860
,APP7,QZ81,70322,APP3,QZ81,43836,16043
GPDCFC26,APP3,QZ81,43836, , ,0,897
,APP3,CSMI,41839,APP5,QZ61,15551,51
,APP3,CSMI,41838,APP5,QZ61,15551,64
,APP3,CSMI,41837,APP5,QZ61,15551,79
,APP5,QZ61,15551,APP3,QZ61,41835,5232
GOTLIS12,APP3,QZ61,41835, , ,0,778
,APP5,QZ61,12,APP3,QZ61,1,5232
GOTLIS12,APP3,QZ61,1, , ,0,778
,APP5,CSMI,111,APP7,QZ81,110,8860
,APP7,QZ81,110,APP3,QZ81,100,16043
ABCDEF,APP3,QZ81,100, , ,0,897"
| fields USER,JOBNAME,TRAN,TRANNUM,PHAPPLID,PHTRAN,PHTRANNO,USRCPUT_MICROSEC
``` Now initialise level 0 numbers ```
| eval level=if(PHTRANNO=0, 0, null()), root_trannum=if(PHTRANNO=0, TRANNUM, null())
``` This logic handles 5 levels of "recursion" - as discussed earlier, it's not true recursion as you have to specify the operations for the max level count you need.
The logic works by cascading the USER and root TRANNUM down the events for the related subtasks. It performs the following actions
- Create a parent id field containing TRANNUM, root TRANNUM USER and level for the specific level wanted - in this case level 0 is PHTRANNO=0 in the IF test
- Collect all the values of those ids across all events
- Find the PHTRANNO value of the event in the list of parents
- Extract the user and root TRANNUM from the result if found
```
``` Get level 1 ids ```
| eval parent_id=if(PHTRANNO=0, TRANNUM.":".root_trannum.":".USER.":".1, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)
``` Get level 2 ids ```
| eval parent_id=if(level=1, TRANNUM.":".root_trannum.":".USER.":".2, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)
``` Get level 3 ids ```
| eval parent_id=if(level=2, TRANNUM.":".root_trannum.":".USER.":".3, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)
``` Get level 4 ids ```
| eval parent_id=if(level=3, TRANNUM.":".root_trannum.":".USER.":".4, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)
``` Get level 5 ids ```
| eval parent_id=if(level=4, TRANNUM.":".root_trannum.":".USER.":".5, null())
| eventstats values(parent_id) as parents
| eval data=split(mvindex(parents, mvfind(parents, "^".PHTRANNO.":")), ":")
| eval root_trannum=if(isnotnull(data), mvindex(data, 1), root_trannum), root_user=if(isnotnull(data), mvindex(data, 2), root_user), level=if(isnotnull(data), mvindex(data, 3), level), USER=coalesce(USER, root_user)
| fields - root_user parents parent_id data
``` This counts all occurrences of the PHTRAN and joins the USER field into the child events ```
| eventstats count(eval(PHTRANNO!=0)) as subTasks by USER root_trannum
``` Now count the executions of each USER and evaluate the timings ```
| stats count(eval(PHTRANNO=0)) as Executions sum(USRCPUT_MICROSEC) as tot_USRCPUT_MICROSEC avg(USRCPUT_MICROSEC) as avg_USRCPUT_MICROSEC sum(eval(if(PHTRANNO=0,subTasks, 0))) as subTasks by USER
``` And adjust the subtask count, as we treated the main task as a subtask and then calculate the average subtask count ```
| eval avg_subTasks=subTasks/Executions
Hopefully the comments help explain what's going on - however, without knowing really what you want from averages and totals, you may need to tweak, however, this is an EXPENSIVE search - so if you're dealing with a large data set it may be slow.
If you have questions about the implementation just ask.
Note, as with Splunk things, there may be a way to improve this, but with no correlation information other than the TRANNUM, it's tricky.