Splunk Search

How to search multiple indexes and join field values that don't exactly match?

StormTrooper
New Member

Hi,

I need to search in multiple indexes but the field values won't match exactly so a straight join will not produce results.

index=proxy Url="" | join [search index=watchlist "".domain."*"]

This is the code I am using and while syntax is ok I don't know if it is doing what I want. The proxy index has a full URL while the watchlist only has the top level of the domain i.e. www.splunk.com

Any help appreciated.

Tags (3)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

if you want to join events per domain, you need to extract the domain in a field for both type of events.
By example with a rex command. Then join the 2 set of results on this new field.

index=proxy | rex field=url "http(s|)://(?[-_\w\d\.]*)"
| join shortdomain [
search index=watchlist | rex field=domain "http(s|)://(?[-_\w\d\.]*)" ]

please adapt to your actual fields formats.

0 Karma

StormTrooper
New Member

Thank you for this, I got it working as I wanted.

P.S. sorry for the delay in replying I haven't had a chance to look at this for a while.

0 Karma

yannK
Splunk Employee
Splunk Employee

The regex command was reformatted by the website,

it should have a tag after the question mark, I added it back in the example below,
please remove the "underscore" to fix it _shortdomain_ to shortdomain

index=proxy | rex field=url "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" | join shortdomain [ search index=watchlist | rex field=domain "http(s|)://(?<_shortdomain_>[-_\w\d\.]*)" ]

StormTrooper
New Member

Thank you for these answers. @ yannK I tried your code but got the following error

Error in 'rex' command: Encountered the following error while compiling the regex 'http(s|)://(?[-_\w\d.]*)': Regex: unrecognized character after (? or (?-

Any idea why? It all looks OK to me so I am not sure what I did wrong.

0 Karma

carpga
New Member

I think you may want to do an eval and rex command on the proxy Url to pull out the top level domain. I believe the join command is going to search for an exact match and I am trying to imagine scenarios where your Url and subsearch on the join won't match but should.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...