Solved: How to group data in a table by 2 or more fields f...

marcvf1 · ‎09-24-2018

Hi Guys, I have a question regarding grouping in tables.

I have sets of data from 2 sources monitoring a transaction in 2 systems. At its start, it gets a TransactionID. The interface system takes the TransactionID and adds a SubID for the subsystems. Each step gets a Transaction time.

One Transaction can have multiple SubIDs which in turn can have several Actions.

1 -> A -> Ac1
1 -> B -> Ac2
1 -> B -> Ac3

It's no problem to do the coalesce based on the ID and do calculations. It gets tricky when I try to Group them.

serach ....| coalesce ...| stats list (Service)  list(Time1)  list(Action) by TransactionID SubID

It works fine if I only Group for TransactionID since it exists in both sources. But it doesn't work anymore for a grouping also by SubID.

Then Service and Time1 disappear. I suppose it's because there is no SubID in their source.

So how do you join from 2 sources with subsequent grouping based on 2 or more fields from different sources?

DalJeanis · ‎09-25-2018

1) You are trying to commit spreadsheet, here. There is no relationship between the columns that you are putting side by side. Also, in general, Splunk doesn't do vertical centering across multiple lines.

2) Your numbers don't add up. I'm assuming that the System1 Time column as presented is supposed to be the total of time for each service, rather than the total of times for all services for that TransId.

Here's how to accomplish something like this...

your search that gets all of the two kinds of records 
| fields TransID Time Service SubID Action

| eval rectype="Detail"
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ServiceTime by TransID Service
    | eval rectype="Service"
    ]
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ActionTime by TransID SubID Action
    | eval rectype="Action"
    ]
| where rectype!="Detail"

You can sort the records here to create any sort order you want.

| stats list(Service) as Service 
        list(ServiceTime) as ServiceTime 
        list(SubID) as SubID 
        list(Action) as Action
        list(ActionTime) as ActionTime
        by TransID

If you need each value to be on a line by itself, rather than in aligned multivalue fields, then you could also write out each record with all the values on each line, and then use streamstats to blank out the ones that are the same as the prior one.

View solution in original post

woodcock · ‎09-25-2018

Try this (be aware that you will not be able to use multireport; use your appendcols; this is to show you the stats commands)

|makeresults
| eval raw="1,A,.5|1,B,.8|1,C,.9|2,A,.5|2,C,.8|2,D,.9|3,A,.1::1,ac1,X,.4|1,ac1,Y,.5|1,ac2,Z,.7|2,ac3,U,.4|2,ac3,W,.7|3,ac5,T,.2|3,ac6,S,.1"
| makemv raw delim="::"
| mvexpand raw
| fields - _time
| streamstats count AS _serial
| makemv raw delim="|"
| mvexpand raw
| rex field=raw "^(?<TransID>[^,]+),(?<Service>[^,]+),(?<Time>[^,]+)$"
| rex field=raw "^(?<TransitID>[^,]+),(?<SubID>[^,]+),(?<Action>[^,]+),(?<Time>[^,]+)$"
| fields - raw

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| eval TransID=coalesce(TransID, TransitID)
| fields - TransitID
| multireport
[ | stats sum(Time) AS Time1 list(Service) AS Service BY TransID ]
[ | eval ActionTime = Action . "=" . Time
  | fields - Action
  | eval Time=if(isnull(ActionTime), null(), tonumber(Time))
  | stats list(*) AS * BY SubID TransID
  | nomv ActionTime
  | stats list(*) AS * BY TransID
  | rename SumTime AS Time1 ]
| table TransID Service Time1 SubID ActionTime
| selfjoin TransID

DalJeanis · ‎09-25-2018

1) You are trying to commit spreadsheet, here. There is no relationship between the columns that you are putting side by side. Also, in general, Splunk doesn't do vertical centering across multiple lines.

2) Your numbers don't add up. I'm assuming that the System1 Time column as presented is supposed to be the total of time for each service, rather than the total of times for all services for that TransId.

Here's how to accomplish something like this...

your search that gets all of the two kinds of records 
| fields TransID Time Service SubID Action

| eval rectype="Detail"
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ServiceTime by TransID Service
    | eval rectype="Service"
    ]
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ActionTime by TransID SubID Action
    | eval rectype="Action"
    ]
| where rectype!="Detail"

You can sort the records here to create any sort order you want.

| stats list(Service) as Service 
        list(ServiceTime) as ServiceTime 
        list(SubID) as SubID 
        list(Action) as Action
        list(ActionTime) as ActionTime
        by TransID

If you need each value to be on a line by itself, rather than in aligned multivalue fields, then you could also write out each record with all the values on each line, and then use streamstats to blank out the ones that are the same as the prior one.

marcvf1 · ‎10-24-2018

Thanks a lot.

You were right. I messed up some of the dummy data. It was exactly what I needed. I somehow ignored the fact that not all of the IDs are in unique logical connection with each other.

How to group data in a table by 2 or more fields from different sources joined with coalesce?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

How to group data in a table by 2 or more fields from different sources joined with coalesce?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...