Splunk Search

Joining two sets of data by common field (numeric userID)

blloyd67
Engager

We have two data sets in the same index returned by an AppMon tool that we are looking to stitch together in Splunk and report on end to end transactions. The common field in both is a numeric userID that is nested in JSON that we need to return with spath.

The two transactions are v2 and v4 (distinguished by name), there are many v2 transactions but we only want to join the v2 transaction when there is a corresponding v4 transaction within a set maxspan (starting with 5 seconds). There is also a duration field in both transactions (named the same) which we'd like to report separately and then also sum to get the total duration.

We've come up with the query below which is working except for the piece that glues the time together by span, which should hopefully take care of the orphan and duplicate rows and get one row per transaction. Any help or direction would be greatly appreciated on how to get the whole thing working as intended. We looked at transaction but were advised that it would be very hard to do the aggregation and the reporting we wanted to do that way.

index=dynatrace "name":"Web User ID V4"| spath "dimensions.Custom Measure - WebUserIDV4" | append [ search  index=dynatrace "name":"Email and Web UserID V2"| spath "dimensions.Web User ID - Whole String" ]  | rename "dimensions.Custom Measure - WebUserIDV4" AS uid | rename "dimensions.Web User ID - Whole String" AS uid | rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email | eval duration_v4=case(name == "Web User ID V4", duration) | eval duration_v2=case(name == "Email and Web UserID V2", duration) | stats list(duration_v2) AS dv2, list(duration_v4) as dv4,  values(email) AS email by uid | foreach dv* [ eval test=dv2+dv4 ]

alt text

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend

Joins and stats aren't necessarily the best in this situation. We're not certain which order your transactions are expected to come in, so for this demo we had to make assumptions.

This version assumes that you want to keep V2 event information only when there is a V4 event within the following 5 seconds. It also assumes that you do not want to roll together multiple V2s that belong to the same uid.

You can adapt the method as needed.

 index=dynatrace "name":"Web User ID V4"
| spath "dimensions.Custom Measure - WebUserIDV4" 
| append 
  [ search  index=dynatrace "name":"Email and Web UserID V2"
  | spath "dimensions.Web User ID - Whole String" 
  ]  
| rename "dimensions.Custom Measure - WebUserIDV4" AS uid 
| rename "dimensions.Web User ID - Whole String" AS uid 
| rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email 
| eval dv4x=case(name == "Web User ID V4", duration) 
| eval dv2=case(name == "Email and Web UserID V2", duration) 

| rename COMMENT as "In Splunk, events natively are sorted most recent first so that we are copying from later V4s to earlier V2s"
| rename COMMENT as "If there is any doubt then add  | sort 0 - _time here "
| streamstats last(dv4x) as dv4 timewindow=5s by uid 

| rename COMMENT as "get rid of all the V4s and any V2s that have no V4 info"
| where (name == "Email and Web UserID V2") AND isnotnull(dv4)
| eval duration=dv2+dv4

View solution in original post

0 Karma

DalJeanis
Legend

Joins and stats aren't necessarily the best in this situation. We're not certain which order your transactions are expected to come in, so for this demo we had to make assumptions.

This version assumes that you want to keep V2 event information only when there is a V4 event within the following 5 seconds. It also assumes that you do not want to roll together multiple V2s that belong to the same uid.

You can adapt the method as needed.

 index=dynatrace "name":"Web User ID V4"
| spath "dimensions.Custom Measure - WebUserIDV4" 
| append 
  [ search  index=dynatrace "name":"Email and Web UserID V2"
  | spath "dimensions.Web User ID - Whole String" 
  ]  
| rename "dimensions.Custom Measure - WebUserIDV4" AS uid 
| rename "dimensions.Web User ID - Whole String" AS uid 
| rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email 
| eval dv4x=case(name == "Web User ID V4", duration) 
| eval dv2=case(name == "Email and Web UserID V2", duration) 

| rename COMMENT as "In Splunk, events natively are sorted most recent first so that we are copying from later V4s to earlier V2s"
| rename COMMENT as "If there is any doubt then add  | sort 0 - _time here "
| streamstats last(dv4x) as dv4 timewindow=5s by uid 

| rename COMMENT as "get rid of all the V4s and any V2s that have no V4 info"
| where (name == "Email and Web UserID V2") AND isnotnull(dv4)
| eval duration=dv2+dv4
0 Karma

blloyd67
Engager

This was perfect - thank you so much!

Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...