When I run the following query:
"com.server" | table id uri statusCode _time | join type=inner saga_id [search "SecondServer" path="/myPath/*" | tablepath, id | where statusCode >= 400 | stats count by uri,statusCode,path | sort -count
Over the last 15 minutes, it returns results. When I run it over a longer time range like 60 min or last 24h, it does not
I am puzzled by this and I am not sure what I am doing wrong. Could you please help?
your subsearch in join will take too long time and/or return too many results. In splunk you should try to avoid join almost every time. There are lot of .conf presentations how this can do. Here is one https://conf.splunk.com/files/2022/slides/PLA1528B.pdf
("com.server" statusCode >= 400) OR ("SecondServer" path="/myPath/*") ```| table id uri statusCode path``` | stats values(uri) as uri values(statusCode) as statusCode values(path) as path by id | stats count uri,statusCode,path | sort - count
This assumes that id is unique among all events in com.server, and unique among all events SecondServer. If it is not, you can use list function, but I'm not sure if it will make semantic sense.
Let me point out several other things in your description. First, code snippet is unreasonably imprecise. I have to speculate that the closing bracket for the join is immediately after the second table command. If this is not the case, the above would be totally wrong. Even in that command, I have to speculate that it is a table command followed by field name path. If this is not the case, the whole semantics has to be changed again. Also, there is no saga_id from the first search to join anything with; the subsearch doesn't output any saga_id, either. You should expect zero output no matter what. Problems like this may look mundane to people with intimate knowledge about your specific use case, dataset like yourself, but tend to discourage volunteers who want to help.
Second, why using inner join when you end up performing stats that doesn't concern id? In addition, applying constraint of statusCode >=400 after inner join only exasperates Splunk's memory pressure which @isoutamo already points out. Not that I will encourage this, but the following join might have worked.
"com.server" statusCode >= 400 ```| table id uri statusCode``` | join id [search "SecondServer" path="/myPath/*" ```| table path, id```] | stats count by uri,statusCode,path | sort - count
Yes, _time is not used so it doesn't matter in the table. But ultimately, using table early in the search also affects efficiency. When the end result is stats, you don't need to table anything except to help troubleshoot. If you want to limit information, use fields instead.