I am facing an issue with the subsearch limitations when using the join statement. My organizations Splunk implementation has a 50k max on subsearches. I was unable to think up an alternative that did not use joins/appends/subsearches, so I thought to ask the community.
I am trying to join two sources on the requestID. Source A is webserver access logs, which contain a unique requestID and a user_agent field that I am using to identify and isolate consumers. Source B is application trace logs, which contain a whole lot more detail on the request made, including the backend request type they initiated.
My goal, is to identify the count of hits to specific backend systems from specified user_agents. To do this, I joined on the requestID, as it was the only common identifying value shared between these two log sources.
My search looks something like this:
index=foo source=bar/access.log user_agent="Mozilla*" | table requestID | join requestID [ search index=foo source=bar/trace.log "Request Complete" | *regex to extract backend system detail* | table requestID backend ]| stats dc(requestID) by backend
So for every incoming request from a web broswer, give me a table of requestIDs, join said ID's to entries within the trace log, and count the total hits by backend request types. The timerange that I hope to search over, far exceeds 50k unique transactions.
Any suggestions to avoid the join statement and subsearch limitation?
... View more