Hi,
I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.
index=proxy Url="" | join [search index=watchlist "".domain."*"]
This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com
Any help appreciated.
if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.
index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]
please adapt to your actual fields formats.
if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.
index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]
please adapt to your actual fields formats.
Thank you for this, I got it working as I wanted.
P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.
The regex command was reformatted by the website,
it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_
to shortdomain
index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]
Thank you for these answers. @ yannK I tried your code but got the following error
Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-
Any idea why? It all looks OK to me so I am not sure what I did wrong.
I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.