I’m trying to find logs where requestId value is equal to requestId value in another log
Trying to find logs like this
"This is log1 with requestId={}"
where requestId is equal to the requestId in log2 ...
"This is log2 with requestId={}, reason={}"
Tried this and its not working …
index=index env=env "This is log 1" | rex "requestId=(?<request_id>[^,]+)" | search requestId="request_id" AND reason=failed
Am I able to join these essentially somehow? I want log 1, not log 2.
Like @ITWhisperer says, join is rarely the right answer, and you need to share more info about your use case. I would add: share more info about your dataset.
In particular, if your mock data is like your real data, why do you need to use in-line rex to extract field requestId? By default (without any search-time field extraction), Splunk extracts from string constructs like name=abcd, name="abcd efg", name='abc defg', and so on. Why is requestId not available to you without rex?
This is relevant because if Splunk already gives you requestId as a field, @caschmid 's idea of subsearch should work without rex.
index=index env=env "This is log 1"
[ search index=index env=env "This is log 2" reason=failed
| fields requestId
| format ]
Similarly, @ITWhisperer 's stats should also work without rex
index=index env=env "This is log 1" OR ("This is log 2" AND reason=failed)
| stats values(reason) as reason values(other_field) as other_field etc by requestId
| where isnotnull(reason)
If for some reason Splunk's automatic search time extraction cannot give you requestId, you can still use those ideas but you need to explain whether the field reason is automatically extracted.
If yes, ITWhisperer's method can be used without modification
index=index env=env "This is log 1" OR ("This is log 2" AND reason=failed)
| rex "requestId=(?<request_id>[^,]+)"
| stats values(reason) as reason values(other_field) as other_field etc by requestId
| where isnotnull(reason)
(But why would reason be extracted but requestId is not?) caschmit's subsearch idea will require modification:
index=index env=env "This is log 1"
| search [ search index=index env=env "This is log 2" reason=failed
| rex field=_raw "requestId=(?<requestId>[^,}]+)"
| fields requestId
| format ]
If neither requestId nor reason is available at search time, you can still employ these strategies but you need to illustrate more details about real data.
In general, if requestId is not available automatically, I strongly recommend that you set up a field extraction for it. This will make your search so much easier and code easier to maintain. (Otherwise why use Splunk?) An exception to this recommendation is if your raw events are in JSON or those fields are from a JSON structure.
Try subsearch to isolate the failed requestIds from log2, and then filter log1 entries.
index=index env=env "This is log 1"
[ search index=index env=env "This is log 2" reason=failed
| rex field=_raw "requestId=(?<requestId>[^,}]+)"
| fields requestId
| format ]
Regards,
Prewin
If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
No. Just as with join, raw subsearch is often the "intuitive" but wrong solution. Not only it's usually highly suboptimal performance-wise compared to other techniques, subsearch has its limit which - when hit - cause the subsearch to be finalized silently leaving you with incomplete/wrong results.
Yes, but no! join is rarely the right way to go!
Start with something like this and tailor it to your needs, e.g. what information do you want from log 1 and log 2
index=index env=env "This is log 1" OR ("This is log 2" AND reason=failed)
| rex "requestId=(?<request_id>[^,]+)"
| stats values(reason) as reason values(other_field) as other_field etc by requestId