Solved: How to join two events in the same index for compa...

eezewski · ‎01-09-2015

Hello Spelunkers,

I have a Splunk query problem that I can't seem to solve.

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| head 100
| table EmailID, EventName, DateScheduled, DateSent, DateIndexed, Time.SendGrid
| Sort -EmailID

Table is only included to illustrate the data in the screen shot.

Screenshot:

What I want to do, is join the delivered event to the processed event on thier EmailID and then take the difference between thier respective DateIndexed epoch time as a new field as follows:

| eval TimeInSendGrid=DeliveredEvent.DateIndexed-ProcessedEvent.DateIndexed

Thanks in advance!

lguinn2 · ‎01-10-2015

I wouldn't use join if this will work

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats range(DateIndexed) as TimeInSendGrid by EmailID

But the above solution will eliminate all the other fields. And it doesn't really check to make sure that you have both a delivered event and a processed event. The following shows how you can retain other fields and also does the check:

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats count last(EventName) as DeliveredEventName first(EventName) as ProcessedEventName last(DateIndexed) as DeliveredDateIndexed first(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid
| where count=2

Both of these solutions avoid the limitations of the join command. But if you would prefer to use the join command:

index=prod-web-apps sourcetype=csv-emailevents  EventName=processed
| rename DateSent as ProcessedDateSent, DateIndexed as ProcessedDateIndexed
| join EmailID type=outer [ search index=prod-web-apps sourcetype=csv-emailevents EventName=delivered  
    | rename DateSent asDeliveredDateSent, DateIndexed as DeliveredDateIndexed]
| eval TimeInSendGrid=DeliveredDateIndexed-ProcessedDateIndexed

This will take about approximately twice as long to run, as Splunk must pass through the data twice (one for the main search and once for the subsearch). Note that you need to rename the fields, otherwise the results from the subsearch will overwrite fields of the same name from the main search.

HTH!

View solution in original post

lguinn2 · ‎01-10-2015

I wouldn't use join if this will work

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats range(DateIndexed) as TimeInSendGrid by EmailID

But the above solution will eliminate all the other fields. And it doesn't really check to make sure that you have both a delivered event and a processed event. The following shows how you can retain other fields and also does the check:

index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
| sort 0 EmailID  -EventName
| stats count last(EventName) as DeliveredEventName first(EventName) as ProcessedEventName last(DateIndexed) as DeliveredDateIndexed first(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid
| where count=2

Both of these solutions avoid the limitations of the join command. But if you would prefer to use the join command:

index=prod-web-apps sourcetype=csv-emailevents  EventName=processed
| rename DateSent as ProcessedDateSent, DateIndexed as ProcessedDateIndexed
| join EmailID type=outer [ search index=prod-web-apps sourcetype=csv-emailevents EventName=delivered  
    | rename DateSent asDeliveredDateSent, DateIndexed as DeliveredDateIndexed]
| eval TimeInSendGrid=DeliveredDateIndexed-ProcessedDateIndexed

This will take about approximately twice as long to run, as Splunk must pass through the data twice (one for the main search and once for the subsearch). Note that you need to rename the fields, otherwise the results from the subsearch will overwrite fields of the same name from the main search.

HTH!

eezewski · ‎01-12-2015

Hey Lisa,

Thank you for the prompt reply, but there is something strange in the second query where it is not grouping the stats by EmailID. I tried to add a "by EmailID" to it but that caused all sorts of other problems.

I was able to get the join working, but it's very slow. What am I doing wrong with the stats solution? I would prefer that if it is much faster.

Thanks,

Evan

lguinn2 · ‎01-12-2015

Oh yeah, that is a problem - it should say "by EmailID". but on further consideration, I would probably do this for the second query:

 index=prod-web-apps sourcetype=csv-emailevents (EventName=delivered OR EventName=processed)
 | stats count latest(EventName) as DeliveredEventName earliest(EventName) as ProcessedEventName latest(DateIndexed) as DeliveredDateIndexed earliest(DateIndexed) as ProcessedDateIndexed range(DateIndexed) as TimeInSendGrid by EmailID
 | where count=2

This should be faster and more accurate, as it simply assumes that the "processed" events will occur at an earlier time than the "delivered" events.

How to join two events in the same index for comparison purposes?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

How to join two events in the same index for comparison purposes?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...