Hi Guys, I have a question regarding grouping in tables.
I have sets of data from 2 sources monitoring a transaction in 2 systems. At its start, it gets a TransactionID. The interface system takes the TransactionID and adds a SubID for the subsystems. Each step gets a Transaction time.
One Transaction can have multiple SubIDs which in turn can have several Actions.
1 -> A -> Ac1 1 -> B -> Ac2 1 -> B -> Ac3
It's no problem to do the coalesce based on the ID and do calculations. It gets tricky when I try to Group them.
serach ....| coalesce ...| stats list (Service) list(Time1) list(Action) by TransactionID SubID
It works fine if I only Group for TransactionID since it exists in both sources. But it doesn't work anymore for a grouping also by SubID.
Then Service and Time1 disappear. I suppose it's because there is no SubID in their source.
So how do you join from 2 sources with subsequent grouping based on 2 or more fields from different sources?
1) You are trying to commit spreadsheet, here. There is no relationship between the columns that you are putting side by side. Also, in general, Splunk doesn't do vertical centering across multiple lines.
2) Your numbers don't add up. I'm assuming that the System1 Time column as presented is supposed to be the total of time for each service, rather than the total of times for all services for that TransId.
Here's how to accomplish something like this...
your search that gets all of the two kinds of records | fields TransID Time Service SubID Action | eval rectype="Detail" | appendpipe [ | where rectype="Detail" | stats sum(Time) as ServiceTime by TransID Service | eval rectype="Service" ] | appendpipe [ | where rectype="Detail" | stats sum(Time) as ActionTime by TransID SubID Action | eval rectype="Action" ] | where rectype!="Detail"
You can sort the records here to create any sort order you want.
| stats list(Service) as Service list(ServiceTime) as ServiceTime list(SubID) as SubID list(Action) as Action list(ActionTime) as ActionTime by TransID
If you need each value to be on a line by itself, rather than in aligned multivalue fields, then you could also write out each record with all the values on each line, and then use streamstats to blank out the ones that are the same as the prior one.
Thanks a lot.
You were right. I messed up some of the dummy data. It was exactly what I needed. I somehow ignored the fact that not all of the IDs are in unique logical connection with each other.
Try this (be aware that you will not be able to use
multireport; use your
appendcols; this is to show you the
|makeresults | eval raw="1,A,.5|1,B,.8|1,C,.9|2,A,.5|2,C,.8|2,D,.9|3,A,.1::1,ac1,X,.4|1,ac1,Y,.5|1,ac2,Z,.7|2,ac3,U,.4|2,ac3,W,.7|3,ac5,T,.2|3,ac6,S,.1" | makemv raw delim="::" | mvexpand raw | fields - _time | streamstats count AS _serial | makemv raw delim="|" | mvexpand raw | rex field=raw "^(?<TransID>[^,]+),(?<Service>[^,]+),(?<Time>[^,]+)$" | rex field=raw "^(?<TransitID>[^,]+),(?<SubID>[^,]+),(?<Action>[^,]+),(?<Time>[^,]+)$" | fields - raw | rename COMMENT AS "Everything above generates sample event data; everything below is your solution" | eval TransID=coalesce(TransID, TransitID) | fields - TransitID | multireport [ | stats sum(Time) AS Time1 list(Service) AS Service BY TransID ] [ | eval ActionTime = Action . "=" . Time | fields - Action | eval Time=if(isnull(ActionTime), null(), tonumber(Time)) | stats list(*) AS * BY SubID TransID | nomv ActionTime | stats list(*) AS * BY TransID | rename SumTime AS Time1 ] | table TransID Service Time1 SubID ActionTime | selfjoin TransID