Splunk Search

Join Two Searches on Shared Field Value

lhillscu
Engager

I have an index that contains all the hits for our WAF and an index that contains the subsequent API call details for any of those hits that are an application calling one our APIs behind the WAF. There is a shared identifier that the WAF passes to the API call so we can link them together and see what IP, user agent string, etc. made that API call. I am trying to pull data from both indexes together into a nice table so that our devs and our security folks can see what API calls are being made, who/what is calling them, and the payloads. 

API search:

index=api source=api_call
| rename id as sessionID
| fields apiName, payload, sessionID


WAF search:

index=waf
| fields src_ip, requestHost, requestPath, requestUserAgent, sessionID

My attempt to join them on the sessionID which is not working. It returns no results.

index=api source=api_call
| rename message.id as sessionID
| fields apiName, message.payload, sessionID
| join sessionID
          [search index=waf
            | fields src_ip, requestHost, requestPath, requestUserAgent, sessionID]
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent


I know joins are not very performative, so I'm open to alternatives that don't use it, but I'm not sure what those would be.

Labels (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

I think this will work without using join.

(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, message.id)
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent

 

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

I think this will work without using join.

(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, message.id)
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent

 

---
If this reply helps you, Karma would be appreciated.

lhillscu
Engager

I figured out the issue. The API fields needed to be double quoted or the reference broke. I assume it has something to do with the message being a JSON object. Outside of that minor syntax issue, your solution worked. Thank you!

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I figured out the issue. The API fields needed to be double quoted or the reference broke. I assume it has something to do with the message being a JSON object. Outside of that minor syntax issue, your solution worked. Thank you!

Actually, the need for quote is because field names contain a major breaker dot (".").  I should have spotted this given that I just wrote Single Quotes, Double Quotes, or No Quotes (SPL) for people who want to confront the wonkiness of SPL's quote rules.  Here, you need to use single quote, not double quote.  And you only need it inside coalesce.

(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, 'message.id')
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent

Though technically you can use double quotes around message.id in rename, fields, stats, and table commands, they are not necessary.  But if you use "message.id" in coalesce, sessionID will get the literal string "message.id" as value for events from index api.

0 Karma

lhillscu
Engager

I now get almost 2 million events, which is about all the events in the WAF log for yesterday, but no table of results. I know that yesterday there was 1 connection through the WAF which produced 6 API calls (one primary and then several downstream). So the number of lines in my table should be 6.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@richgalloway's solution should give the correct results and is more efficient.  But you need to clarify @danielcj's question thoroughly.  In your response, you reprinted | rename id as sessionID as in the first part of your original post, which contradicts the second part of your original post where | rename message.id as sessionID is printed.  Does index api give

  1. id, or
  2. message.id, or
  3. both but only one of them should be used as sessionID?

@richgalloway's solution should work in case 2.  If index api gives id (or if it gives both but only id should be used in match) - which the first part of your original post and your reply to @danielcj imply, the solution can easily be adapted to

(index=api source=api_call) OR index=waf
| fields id apiName, message.payload, src_ip, requestHost, requestPath, requestUserAgent, sessionID
| eval sessionID = coalesce(sessionID, id)
| stats values(*) as * by sessionID

Hope this helps.

0 Karma

lhillscu
Engager

It's message.id. There is a JSON object called message that has a number of fields in it. message.id gets the WAF session id value.

0 Karma

danielcj
Communicator

Hello @lhillscu ,

Your query seems correct. Could you please confirm that these 2 fields really exists in the last query? Since these 2 fields were not referenced in the "API Search" example.

message.id
message.payload

 

Thanks.

 

 

0 Karma

lhillscu
Engager

Apologies I took out all the extra renames to try to simplify the search since those aren't really critical to the data I'm trying to get. The fields are actually as they are named in the full search with the join. First search thus should be:

index=api source=api_call
| rename id as sessionID
| fields apiName, message.payload, sessionID

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...