Splunk Search

Proper use of join vs. append - correlating events by a common field

ft_kd02
Path Finder

Hi all,

I'm working to correlate a series of events. These events are all part of a logging process of a separate application. What seems to be common is a UUID. The following searches produce what I'd like individually:

for the first timestamp associated with the start of the process (there are multiple processes running for each UUID, but the ask is to extract the first / last timestamp)

index=###### sourcetype=### "process() - start"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats earliest(startTime) as startingTimeStamp by UUID

index=###### sourcetype=### "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval endTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats latest(endTime) as endingTimeStamp by UUID

I'm trying to learn how to use join to connect these events by UUID. This SPL returns a table that has the earliest and latest of startTime, rather than the earliest(startTime) and latest(endTime). 

index=###### sourcetype=### "process() - start"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats earliest(startTime) as startingTimeStamp by UUID
| join UUID type=left
[ search index=###### sourcetype=### "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval endTime = strftime(_time, "%Y-%d-%m %H:%M:%S")
| stats latest(endTime) as endingTimeStamp by UUID
]

Is join the appropriate function to use here? I'm reading in coalesce and append as well, but from my understanding append does not fit. Another piece is that the UUID is not a field extraction, but rather a regex, so I'm unsure how the join would be able to function if the subsearch has no knowledge of UUID until it runs and performs the rex. 

Labels (1)
0 Karma
1 Solution

somesoni2
Revered Legend

You can avoid join altogether for this use case:

index=###### sourcetype=### "process() - start" OR "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = if(searchmatch("process() - start"),_time,null())
| eval endTime = if(searchmatch("process() - end"),_time,null())
| stats min(startTime) as startingTimeStamp max(endTime) as endingTimestamp by UUID | eval startingTimeStamp=strftime(startingTimeStamp, "%Y-%d-%m %H:%M:%S")|  eval endingTimestamp=strftime(endingTimestamp, "%Y-%d-%m %H:%M:%S")

View solution in original post

somesoni2
Revered Legend

You can avoid join altogether for this use case:

index=###### sourcetype=### "process() - start" OR "process() - end"
| rex "^(?:[^ \n]* ){2}(?P<UUID>[^,]+)"
| eval startTime = if(searchmatch("process() - start"),_time,null())
| eval endTime = if(searchmatch("process() - end"),_time,null())
| stats min(startTime) as startingTimeStamp max(endTime) as endingTimestamp by UUID | eval startingTimeStamp=strftime(startingTimeStamp, "%Y-%d-%m %H:%M:%S")|  eval endingTimestamp=strftime(endingTimestamp, "%Y-%d-%m %H:%M:%S")

ft_kd02
Path Finder

Marking this as the solution because it is the correct answer to the question I asked. I misunderstood the problem and what I need to do is group the 4 events together for each process()-start and end. The 4 events, together, constitute a single api call. The issue I'm seeing is that there are no unique identifiers outside of _time() that can be used to group them together. Currently looking into the | transaction command as an alternative. 

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...