Splunk Search

Alternative method to using Join command

zekiramhi
Path Finder

Hello,

I am a big fan of using Join for combining results of different sourcetypes and indexes (especially with a type=left parameter) but I do see alot of hate in the community towards the usage of Join considering performance and resource usage at the backend.

Can anyone provide me with a valuable resource that shows methods for different scenarios as to when and which alternative method should be used instead of the infamous Join command, and at which scenario should Join be used as a last resort.

Regards,

Tags (3)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @zekiramhi,

try something like this:

(index=paloalto src=* dest=* policyid IN (1,2,3)) OR (sourcetype=dns index=linux)
| rename Source AS src Destination AS dest 
| stats values(url) AS url values(policyid) AS policyid values(action) AS action values(query) as query BY src dest 
| sort limit=0 src 

N.B.: beware that "query" is a special word in Splunk, I'm not sure that you can use it as a field name, probably you have to rename it.

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @zekiramhi,

usually the people that loves join are people that comes from SQL, but Splunk isn't a DB, it's a search engine, so you should try to think in a different way.

I arrived as you from SQL and I did this work at the beginning of my Splunk activity: I resetted my approach to data correlation.

The reasons to avoid join are essentially two.

  • at first it's very slow,
  • then there's the limit of 50,000 results in subsearches.

How to have the same result of join?

You have to use the stats command, using BY clause for the join key and then use the options of stats to have the values you need.

e.g. if you have something like this:

index=indexA
| join key [ search index=indexB ]
| table _time key fieldA fieldB

you can have the same result with

index=indexA OR index=indexB
| stats earliest(_time) AS _time values(fieldA) AS fieldA values(fieldB) AS fieldB BY key

if you can have more values for fieldA or fieldB and you want only one, you can use last or first instead values clause.

Ciao.

Giuseppe

zekiramhi
Path Finder

Hello Giuseppe,

Thank you for your response,

However, my main problem with the method you described is that I actually need specific field values from Index A, for example I only want to Join the DNS queries from index=B, only to specific source and destinations based on a couple of Policy IDs from index=A

Some of those policy IDs might not even have any query logs, so I also want them to return null values from index=B,

Hopefully you catch my drift,

Best Regards,

0 Karma

gcusello
SplunkTrust
SplunkTrust

@zekiramhi,

As I said, you have to create two separate searches (as if you're preparing a join), identifying the key fields and if you have to elaborate one of them (e.g. changing name).

then you can run a search with both the searches, correlating values by the key.

If you want more help, you could share your two searches and tell me the fields to use as key and the ones to display.

Ciao.

Giuseppe

0 Karma

zekiramhi
Path Finder

Hello,

Absolutely, the key fields to join here would be the src and dest fields, an example query would be the following:

index=paloalto src=* dest=* policyid IN (1,2,3) 
| stats values(url) by policyid src dest action 
| sort limit=0 src 
| join type=left src dest 
    [ search sourcetype=dns index=linux
    | stats values(query) as query by Source Destination
    | rename Source as src Destination as dest] 

 Best Regards,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @zekiramhi,

try something like this:

(index=paloalto src=* dest=* policyid IN (1,2,3)) OR (sourcetype=dns index=linux)
| rename Source AS src Destination AS dest 
| stats values(url) AS url values(policyid) AS policyid values(action) AS action values(query) as query BY src dest 
| sort limit=0 src 

N.B.: beware that "query" is a special word in Splunk, I'm not sure that you can use it as a field name, probably you have to rename it.

Ciao.

Giuseppe

zekiramhi
Path Finder

Hello Giuseppe,

Yes I am very aware of the "query" and "search" being special field names that behave differently, I now understand the technique abit more but I think I am having problems with the duplicate or key field names IE: src and dest.

src seems to be bringing me results for both indexes and working fine but dest only brings me for one of them, and that changes based on what I rename first interestingly enough, Any ideas as to what might be the cause?

On a side note, on data models and tstats we have the ability to specifiy which field from which index to pull results from (example: index1.field1 or index2.field1), is there a way to do that for normal Splunk searches?

Apologies for the amount of questions I have bothered, just curious 😁

Best Regards,

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @zekiramhi,

As I said it's important to normalize fields using the same name in both the searches and eventually the same case (upper or lower).

If you haven't all the fields in both the searches, use only the one surely present in both and take the others as values or first or last.

about the fields choice, you have the fields command: list after the field command all the fields you need from both the searches.

About your curiosity, is the driver of your knowledge!

Continue to try and to ask questins: me and many others will surely answer you.

Ciao and happy splunking.

Giuseppe

P.S. Karma Points are appreciated 😉

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...