Splunk Search

Proper use of join vs. append - correlating events by a common field

ft_kd02
Path Finder

Hi all,

I'm working to correlate a series of events. These events are all part of a logging process of a separate application. What seems to be common is a UUID. The following searches produce what I'd like individually:

for the first timestamp associated with the start of the process (there are multiple processes running for each UUID, but the ask is to extract the first / last timestamp)

index=###### sourcetype=### "process() - start"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats earliest(startTime) as startingTimeStamp by UUID

index=###### sourcetype=### "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval endTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats latest(endTime) as endingTimeStamp by UUID

I'm trying to learn how to use join to connect these events by UUID. This SPL returns a table that has the earliest and latest of startTime, rather than the earliest(startTime) and latest(endTime). 

index=###### sourcetype=### "process() - start"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats earliest(startTime) as startingTimeStamp by UUID
| join UUID type=left
[ search index=###### sourcetype=### "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval endTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats latest(endTime) as endingTimeStamp by UUID
]

Is join the appropriate function to use here? I'm reading in coalesce and append as well, but from my understanding append does not fit. Another piece is that the UUID is not a field extraction, but rather a regex, so I'm unsure how the join would be able to function if the subsearch has no knowledge of UUID until it runs and performs the rex. 

Labels (1)
0 Karma
1 Solution

somesoni2
Revered Legend

You can avoid join altogether for this use case:

index=###### sourcetype=### "process() - start" OR "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = if(searchmatch("process() - start"),_time,null())
| eval endTime = if(searchmatch("process() - end"),_time,null())
| stats min(startTime) as startingTimeStamp max(endTime) as endingTimestamp by UUID | eval startingTimeStamp=strftime(startingTimeStamp, "%Y-%d-%m %H:%M:%S")|  eval endingTimestamp=strftime(endingTimestamp, "%Y-%d-%m %H:%M:%S")

View solution in original post

somesoni2
Revered Legend

You can avoid join altogether for this use case:

index=###### sourcetype=### "process() - start" OR "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = if(searchmatch("process() - start"),_time,null())
| eval endTime = if(searchmatch("process() - end"),_time,null())
| stats min(startTime) as startingTimeStamp max(endTime) as endingTimestamp by UUID | eval startingTimeStamp=strftime(startingTimeStamp, "%Y-%d-%m %H:%M:%S")|  eval endingTimestamp=strftime(endingTimestamp, "%Y-%d-%m %H:%M:%S")

ft_kd02
Path Finder

Marking this as the solution because it is the correct answer to the question I asked. I misunderstood the problem and what I need to do is group the 4 events together for each process()-start and end. The 4 events, together, constitute a single api call. The issue I'm seeing is that there are no unique identifiers outside of _time() that can be used to group them together. Currently looking into the | transaction command as an alternative. 

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...