I have a web application where each incoming request is given a unique requestID so we can see all the logs for that particular 'request'. This isn't currently a field, but I could/probably should make it one.
I am looking for particular events where we log a problem. I want to pull the requestID for all of these events and show the entire 'request' for all of them. So far looking around it seems the 'map' command is the way to do this. What I haven't seen or figured out is how to do this for multiple requestIDs at once exactly.
index=foo REQUEST_TIME>2000 | rex field=_raw "^\[\((?<REQID>[^\)]*)" | map search="index=foo $REQID$"
That's the best I've come up with. The regex works because if I pass it to something else like stats count I can see the value with a count of 1.
index=foo REQUEST_TIME>2000 | rex field=_raw "^\[\((?<REQID>[^\)]*)" | stats count by REQID
So I am close I think. I'd like to make it work before I change the log output to make reqid a field.
The map command is useful, but it will run, I believe, sequential searches for each event given to it, with a default of 10.
I assume that the request Id will be generating more than one event in Splunk, so you're looking to pull together all the events for that id, where one or more of those events took more than 2000ms.
The transaction command is one alternative, but I would generally advise against that, for a number of resource related reasons.
An alternative is to do the outer search and then join a subsearch on the ids, but again you may come up with resource issues as the join has limitations. By putting the >2000ms search as the subsearch you can minimise the join size.
The other option is to use
index=foo
| rex field=_raw "^\[\((?<REQID>[^\)]*)" |
| stats values(_raw) as events max(REQUEST_TIME) as mrt by REQID
| where mrt>2000
| mvexpand events
....
then you will have all events for the problematic request ids. A lot will depend on data volume, performance and what data you then need from those events, but you can explore this.
Thank you, I will give it a try. These will be run manually on an ad-hoc basis so hopefully the performance won't be too big of an issue.