Splunk Search

How to search multiple indexes and join field values that don't exactly match?

StormTrooper
New Member

Hi,

I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.

index=proxy Url="" | join [search index=watchlist "".domain."*"]

This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com

Any help appreciated.

Tags (3)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

0 Karma

StormTrooper
New Member

Thank you for this, I got it working as I wanted.

P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.

0 Karma

yannK
Splunk Employee
Splunk Employee

The regex command was reformatted by the website,

it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_ to shortdomain

index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]

StormTrooper
New Member

Thank you for these answers. @ yannK I tried your code but got the following error

Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-

Any idea why? It all looks OK to me so I am not sure what I did wrong.

0 Karma

carpga
New Member

I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

 Prepare to elevate your security operations with the powerful upgrade to Splunk Enterprise Security 8.x! This ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Passionate about security automation? Apply now to our AI Playbook Authoring Alpha private preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management

Managing high-volume firewall data has always been a challenge. Noisy events and verbose traffic logs often ...