Splunk Search

How to search multiple indexes and join field values that don't exactly match?

New Member

Hi,

I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.

index=proxy Url="" | join [search index=watchlist "".domain."*"]

This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com

Any help appreciated.

Tags (3)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

New Member

Thank you for this, I got it working as I wanted.

P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.

0 Karma

Splunk Employee
Splunk Employee

The regex command was reformatted by the website,

it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_ to shortdomain

index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]

New Member

Thank you for these answers. @ yannK I tried your code but got the following error

Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-

Any idea why? It all looks OK to me so I am not sure what I did wrong.

0 Karma

New Member

I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.

0 Karma