Splunk Search

Joining two sets of data by common field (numeric userID)

blloyd67
Engager

We have two data sets in the same index returned by an AppMon tool that we are looking to stitch together in Splunk and report on end to end transactions. The common field in both is a numeric userID that is nested in JSON that we need to return with spath.

The two transactions are v2 and v4 (distinguished by name), there are many v2 transactions but we only want to join the v2 transaction when there is a corresponding v4 transaction within a set maxspan (starting with 5 seconds). There is also a duration field in both transactions (named the same) which we'd like to report separately and then also sum to get the total duration.

We've come up with the query below which is working except for the piece that glues the time together by span, which should hopefully take care of the orphan and duplicate rows and get one row per transaction. Any help or direction would be greatly appreciated on how to get the whole thing working as intended. We looked at transaction but were advised that it would be very hard to do the aggregation and the reporting we wanted to do that way.

index=dynatrace "name":"Web User ID V4"| spath "dimensions.Custom Measure - WebUserIDV4" | append [ search  index=dynatrace "name":"Email and Web UserID V2"| spath "dimensions.Web User ID - Whole String" ]  | rename "dimensions.Custom Measure - WebUserIDV4" AS uid | rename "dimensions.Web User ID - Whole String" AS uid | rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email | eval duration_v4=case(name == "Web User ID V4", duration) | eval duration_v2=case(name == "Email and Web UserID V2", duration) | stats list(duration_v2) AS dv2, list(duration_v4) as dv4,  values(email) AS email by uid | foreach dv* [ eval test=dv2+dv4 ]

alt text

Tags (1)
0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Joins and stats aren't necessarily the best in this situation. We're not certain which order your transactions are expected to come in, so for this demo we had to make assumptions.

This version assumes that you want to keep V2 event information only when there is a V4 event within the following 5 seconds. It also assumes that you do not want to roll together multiple V2s that belong to the same uid.

You can adapt the method as needed.

 index=dynatrace "name":"Web User ID V4"
| spath "dimensions.Custom Measure - WebUserIDV4" 
| append 
  [ search  index=dynatrace "name":"Email and Web UserID V2"
  | spath "dimensions.Web User ID - Whole String" 
  ]  
| rename "dimensions.Custom Measure - WebUserIDV4" AS uid 
| rename "dimensions.Web User ID - Whole String" AS uid 
| rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email 
| eval dv4x=case(name == "Web User ID V4", duration) 
| eval dv2=case(name == "Email and Web UserID V2", duration) 

| rename COMMENT as "In Splunk, events natively are sorted most recent first so that we are copying from later V4s to earlier V2s"
| rename COMMENT as "If there is any doubt then add  | sort 0 - _time here "
| streamstats last(dv4x) as dv4 timewindow=5s by uid 

| rename COMMENT as "get rid of all the V4s and any V2s that have no V4 info"
| where (name == "Email and Web UserID V2") AND isnotnull(dv4)
| eval duration=dv2+dv4

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Joins and stats aren't necessarily the best in this situation. We're not certain which order your transactions are expected to come in, so for this demo we had to make assumptions.

This version assumes that you want to keep V2 event information only when there is a V4 event within the following 5 seconds. It also assumes that you do not want to roll together multiple V2s that belong to the same uid.

You can adapt the method as needed.

 index=dynatrace "name":"Web User ID V4"
| spath "dimensions.Custom Measure - WebUserIDV4" 
| append 
  [ search  index=dynatrace "name":"Email and Web UserID V2"
  | spath "dimensions.Web User ID - Whole String" 
  ]  
| rename "dimensions.Custom Measure - WebUserIDV4" AS uid 
| rename "dimensions.Web User ID - Whole String" AS uid 
| rename "dimensions.Custom Measure - PP UserName - IDocumentV2" AS email 
| eval dv4x=case(name == "Web User ID V4", duration) 
| eval dv2=case(name == "Email and Web UserID V2", duration) 

| rename COMMENT as "In Splunk, events natively are sorted most recent first so that we are copying from later V4s to earlier V2s"
| rename COMMENT as "If there is any doubt then add  | sort 0 - _time here "
| streamstats last(dv4x) as dv4 timewindow=5s by uid 

| rename COMMENT as "get rid of all the V4s and any V2s that have no V4 info"
| where (name == "Email and Web UserID V2") AND isnotnull(dv4)
| eval duration=dv2+dv4
0 Karma

blloyd67
Engager

This was perfect - thank you so much!

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...