I have an index that contains all the hits for our WAF and an index that contains the subsequent API call details for any of those hits that are an application calling one our APIs behind the WAF. There is a shared identifier that the WAF passes to the API call so we can link them together and see what IP, user agent string, etc. made that API call. I am trying to pull data from both indexes together into a nice table so that our devs and our security folks can see what API calls are being made, who/what is calling them, and the payloads.
API search:
index=api source=api_call
| rename id as sessionID
| fields apiName, payload, sessionID
WAF search:
index=waf
| fields src_ip, requestHost, requestPath, requestUserAgent, sessionID
My attempt to join them on the sessionID which is not working. It returns no results.
index=api source=api_call
| rename message.id as sessionID
| fields apiName, message.payload, sessionID
| join sessionID
[search index=waf
| fields src_ip, requestHost, requestPath, requestUserAgent, sessionID]
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent
I know joins are not very performative, so I'm open to alternatives that don't use it, but I'm not sure what those would be.
I think this will work without using join.
(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, message.id)
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent
I think this will work without using join.
(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, message.id)
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent
I figured out the issue. The API fields needed to be double quoted or the reference broke. I assume it has something to do with the message being a JSON object. Outside of that minor syntax issue, your solution worked. Thank you!
I figured out the issue. The API fields needed to be double quoted or the reference broke. I assume it has something to do with the message being a JSON object. Outside of that minor syntax issue, your solution worked. Thank you!
Actually, the need for quote is because field names contain a major breaker dot ("."). I should have spotted this given that I just wrote Single Quotes, Double Quotes, or No Quotes (SPL) for people who want to confront the wonkiness of SPL's quote rules. Here, you need to use single quote, not double quote. And you only need it inside coalesce.
(index=api source=api_call) OR index=waf
| eval sessionID=coalesce(sessionID, 'message.id')
| fields apiName, message.payload, sessionID,src_ip, requestHost, requestPath, requestUserAgent
| stats values(*) as * by sessionID
| table apiName, message.payload, sessionID, src_ip, requestHost, requestPath, requestUserAgent
Though technically you can use double quotes around message.id in rename, fields, stats, and table commands, they are not necessary. But if you use "message.id" in coalesce, sessionID will get the literal string "message.id" as value for events from index api.
I now get almost 2 million events, which is about all the events in the WAF log for yesterday, but no table of results. I know that yesterday there was 1 connection through the WAF which produced 6 API calls (one primary and then several downstream). So the number of lines in my table should be 6.
@richgalloway's solution should give the correct results and is more efficient. But you need to clarify @danielcj's question thoroughly. In your response, you reprinted | rename id as sessionID as in the first part of your original post, which contradicts the second part of your original post where | rename message.id as sessionID is printed. Does index api give
@richgalloway's solution should work in case 2. If index api gives id (or if it gives both but only id should be used in match) - which the first part of your original post and your reply to @danielcj imply, the solution can easily be adapted to
(index=api source=api_call) OR index=waf
| fields id apiName, message.payload, src_ip, requestHost, requestPath, requestUserAgent, sessionID
| eval sessionID = coalesce(sessionID, id)
| stats values(*) as * by sessionID
Hope this helps.
It's message.id. There is a JSON object called message that has a number of fields in it. message.id gets the WAF session id value.
Hello @lhillscu ,
Your query seems correct. Could you please confirm that these 2 fields really exists in the last query? Since these 2 fields were not referenced in the "API Search" example.
message.id
message.payload
Thanks.
Apologies I took out all the extra renames to try to simplify the search since those aren't really critical to the data I'm trying to get. The fields are actually as they are named in the full search with the join. First search thus should be:
index=api source=api_call
| rename id as sessionID
| fields apiName, message.payload, sessionID