Splunk Search

How to group data in a table by 2 or more fields from different sources joined with coalesce?

marcvf1
Engager

Hi Guys, I have a question regarding grouping in tables.

I have sets of data from 2 sources monitoring a transaction in 2 systems. At its start, it gets a TransactionID. The interface system takes the TransactionID and adds a SubID for the subsystems. Each step gets a Transaction time.

One Transaction can have multiple SubIDs which in turn can have several Actions.

1 -> A -> Ac1
1 -> B -> Ac2
1 -> B -> Ac3

alt text

It's no problem to do the coalesce based on the ID and do calculations. It gets tricky when I try to Group them.

serach ....| coalesce ...| stats list (Service)  list(Time1)  list(Action) by TransactionID SubID

It works fine if I only Group for TransactionID since it exists in both sources. But it doesn't work anymore for a grouping also by SubID.

Then Service and Time1 disappear. I suppose it's because there is no SubID in their source.

So how do you join from 2 sources with subsequent grouping based on 2 or more fields from different sources?

0 Karma
1 Solution

DalJeanis
Legend

1) You are trying to commit spreadsheet, here. There is no relationship between the columns that you are putting side by side. Also, in general, Splunk doesn't do vertical centering across multiple lines.

2) Your numbers don't add up. I'm assuming that the System1 Time column as presented is supposed to be the total of time for each service, rather than the total of times for all services for that TransId.

Here's how to accomplish something like this...

your search that gets all of the two kinds of records 
| fields TransID Time Service SubID Action

| eval rectype="Detail"
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ServiceTime by TransID Service
    | eval rectype="Service"
    ]
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ActionTime by TransID SubID Action
    | eval rectype="Action"
    ]
| where rectype!="Detail" 

You can sort the records here to create any sort order you want.

| stats list(Service) as Service 
        list(ServiceTime) as ServiceTime 
        list(SubID) as SubID 
        list(Action) as Action
        list(ActionTime) as ActionTime
        by TransID

If you need each value to be on a line by itself, rather than in aligned multivalue fields, then you could also write out each record with all the values on each line, and then use streamstats to blank out the ones that are the same as the prior one.

View solution in original post

0 Karma

woodcock
Esteemed Legend

Try this (be aware that you will not be able to use multireport; use your appendcols; this is to show you the stats commands)

|makeresults
| eval raw="1,A,.5|1,B,.8|1,C,.9|2,A,.5|2,C,.8|2,D,.9|3,A,.1::1,ac1,X,.4|1,ac1,Y,.5|1,ac2,Z,.7|2,ac3,U,.4|2,ac3,W,.7|3,ac5,T,.2|3,ac6,S,.1"
| makemv raw delim="::"
| mvexpand raw
| fields - _time
| streamstats count AS _serial
| makemv raw delim="|"
| mvexpand raw
| rex field=raw "^(?<TransID>[^,]+),(?<Service>[^,]+),(?<Time>[^,]+)$"
| rex field=raw "^(?<TransitID>[^,]+),(?<SubID>[^,]+),(?<Action>[^,]+),(?<Time>[^,]+)$"
| fields - raw

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| eval TransID=coalesce(TransID, TransitID)
| fields - TransitID
| multireport
[ | stats sum(Time) AS Time1 list(Service) AS Service BY TransID ]
[ | eval ActionTime = Action . "=" . Time
  | fields - Action
  | eval Time=if(isnull(ActionTime), null(), tonumber(Time))
  | stats list(*) AS * BY SubID TransID
  | nomv ActionTime
  | stats list(*) AS * BY TransID
  | rename SumTime AS Time1 ]
| table TransID Service Time1 SubID ActionTime
| selfjoin TransID
0 Karma

DalJeanis
Legend

1) You are trying to commit spreadsheet, here. There is no relationship between the columns that you are putting side by side. Also, in general, Splunk doesn't do vertical centering across multiple lines.

2) Your numbers don't add up. I'm assuming that the System1 Time column as presented is supposed to be the total of time for each service, rather than the total of times for all services for that TransId.

Here's how to accomplish something like this...

your search that gets all of the two kinds of records 
| fields TransID Time Service SubID Action

| eval rectype="Detail"
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ServiceTime by TransID Service
    | eval rectype="Service"
    ]
| appendpipe [
    | where rectype="Detail" 
    | stats sum(Time) as ActionTime by TransID SubID Action
    | eval rectype="Action"
    ]
| where rectype!="Detail" 

You can sort the records here to create any sort order you want.

| stats list(Service) as Service 
        list(ServiceTime) as ServiceTime 
        list(SubID) as SubID 
        list(Action) as Action
        list(ActionTime) as ActionTime
        by TransID

If you need each value to be on a line by itself, rather than in aligned multivalue fields, then you could also write out each record with all the values on each line, and then use streamstats to blank out the ones that are the same as the prior one.

0 Karma

marcvf1
Engager

Thanks a lot.

You were right. I messed up some of the dummy data. It was exactly what I needed. I somehow ignored the fact that not all of the IDs are in unique logical connection with each other.

Get Updates on the Splunk Community!

Notification Email Migration Announcement

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email Service (SES) ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...