Splunk Search

How to join two events in the same index for comparison purposes?

eezewski
New Member

Hello Spelunkers,

I have a Splunk query problem that I can't seem to solve.

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| head 100
| table EmailID, EventName, DateScheduled, DateSent, DateIndexed, Time.SendGrid
| Sort -EmailID
  • Table is only included to illustrate the data in the screen shot.

Screenshot:
Screenshot

What I want to do, is join the delivered event to the processed event on thier EmailID and then take the difference between thier respective DateIndexed epoch time as a new field as follows:

| eval TimeInSendGrid=DeliveredEvent.DateIndexed-ProcessedEvent.DateIndexed

Thanks in advance!

0 Karma
1 Solution

lguinn2
Legend

I wouldn't use join if this will work

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats range(DateIndexed) as TimeInSendGrid by EmailID

But the above solution will eliminate all the other fields. And it doesn't really check to make sure that you have both a delivered event and a processed event. The following shows how you can retain other fields and also does the check:

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats count last(EventName) as DeliveredEventName first(EventName) as ProcessedEventName last(DateIndexed) as DeliveredDateIndexed first(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid
| where count=2 

Both of these solutions avoid the limitations of the join command. But if you would prefer to use the join command:

index=prod-web-apps sourcetype=csv-emailevents  EventName=processed
| rename DateSent as ProcessedDateSent, DateIndexed as ProcessedDateIndexed
| join EmailID type=outer [ search index=prod-web-apps sourcetype=csv-emailevents EventName=delivered  
    | rename DateSent asDeliveredDateSent, DateIndexed as DeliveredDateIndexed]
| eval TimeInSendGrid=DeliveredDateIndexed-ProcessedDateIndexed

This will take about approximately twice as long to run, as Splunk must pass through the data twice (one for the main search and once for the subsearch). Note that you need to rename the fields, otherwise the results from the subsearch will overwrite fields of the same name from the main search.

HTH!

View solution in original post

lguinn2
Legend

I wouldn't use join if this will work

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats range(DateIndexed) as TimeInSendGrid by EmailID

But the above solution will eliminate all the other fields. And it doesn't really check to make sure that you have both a delivered event and a processed event. The following shows how you can retain other fields and also does the check:

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats count last(EventName) as DeliveredEventName first(EventName) as ProcessedEventName last(DateIndexed) as DeliveredDateIndexed first(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid
| where count=2 

Both of these solutions avoid the limitations of the join command. But if you would prefer to use the join command:

index=prod-web-apps sourcetype=csv-emailevents  EventName=processed
| rename DateSent as ProcessedDateSent, DateIndexed as ProcessedDateIndexed
| join EmailID type=outer [ search index=prod-web-apps sourcetype=csv-emailevents EventName=delivered  
    | rename DateSent asDeliveredDateSent, DateIndexed as DeliveredDateIndexed]
| eval TimeInSendGrid=DeliveredDateIndexed-ProcessedDateIndexed

This will take about approximately twice as long to run, as Splunk must pass through the data twice (one for the main search and once for the subsearch). Note that you need to rename the fields, otherwise the results from the subsearch will overwrite fields of the same name from the main search.

HTH!

eezewski
New Member

Hey Lisa,

Thank you for the prompt reply, but there is something strange in the second query where it is not grouping the stats by EmailID. I tried to add a "by EmailID" to it but that caused all sorts of other problems.

I was able to get the join working, but it's very slow. What am I doing wrong with the stats solution? I would prefer that if it is much faster.

Thanks,

Evan

0 Karma

lguinn2
Legend

Oh yeah, that is a problem - it should say "by EmailID". but on further consideration, I would probably do this for the second query:

 index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
 | stats count latest(EventName) as DeliveredEventName earliest(EventName) as ProcessedEventName latest(DateIndexed) as DeliveredDateIndexed earliest(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid by EmailID
 | where count=2 

This should be faster and more accurate, as it simply assumes that the "processed" events will occur at an earlier time than the "delivered" events.

Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...