Splunk Search

How to search multiple indexes and join field values that don't exactly match?

StormTrooper
New Member

Hi,

I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.

index=proxy Url="" | join [search index=watchlist "".domain."*"]

This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com

Any help appreciated.

Tags (3)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

0 Karma

StormTrooper
New Member

Thank you for this, I got it working as I wanted.

P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.

0 Karma

yannK
Splunk Employee
Splunk Employee

The regex command was reformatted by the website,

it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_ to shortdomain

index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]

StormTrooper
New Member

Thank you for these answers. @ yannK I tried your code but got the following error

Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-

Any idea why? It all looks OK to me so I am not sure what I did wrong.

0 Karma

carpga
New Member

I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.

0 Karma
Get Updates on the Splunk Community!

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

 Ready to master Kubernetes and cloud monitoring like the pros? Join Splunk’s Growth Engineering team for an ...

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

To Community SOAR App Developers - we're reaching out with an important update regarding Python 3.9's ...

October Community Champions: A Shoutout to Our Contributors!

As October comes to a close, we want to take a moment to celebrate the people who make the Splunk Community ...