Splunk Search

How to join or corelate 3 different index/sourcetypes in single query

selvam_sekar
Path Finder

Hi,

I have requirement as below, please could you review and suggest ?

Need to pick up all client ids from application log called "Cos" (index=a sourcetype=Cos ) where distinct client ids are in 6Millions. And, I want to compare whether these clients ids are present in  another application log called "Ma" (index=a sourcetype=Ma).

And, I also want to compare the same in another application  called "Ph" (index=a sourcetype=Ph)

Basically trying to get the count/volume based on the client id, which is common among the 3 application (Cos, Ma,Ph). The total events are in Millions and when i use join, the search job is getting auto-cancelled or getting terminated.

(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph) stats count by clientid, sourcetype

 

Thanks,

Selvam.

Labels (4)
Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @selvam_sekar ,

to identify the common clientids between the threee sourcetypes, you should run something like this:

index=a sourcetype IN ("Cos","Ma","Ph") 
| stats 
   count 
   dc(sourcetype) AS sourcetype_count 
   BY clientid
| where sourcetype_count=3
| fields - sourcetype_count

Ciao.

Giuseppe

 

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @selvam_sekar ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @selvam_sekar ,

to identify the common clientids between the threee sourcetypes, you should run something like this:

index=a sourcetype IN ("Cos","Ma","Ph") 
| stats 
   count 
   dc(sourcetype) AS sourcetype_count 
   BY clientid
| where sourcetype_count=3
| fields - sourcetype_count

Ciao.

Giuseppe

 

yuanliu
SplunkTrust
SplunkTrust

As you already experience, Splunk strongly disfavors join.  This is just natural as most noSQL do.

So, you explained how many events these sources can give, and how many different client ID's.  What you forget to tell us is what you mean by "to get the count/volume based on the client id".  If you only want to count events from each sourcetype by clientid, all you need to do is

(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph)
``` you can also use
index=a sourcetype IN (Cos, Ma, Ph)
```
| stats count by clientid, sourcetype

(which is a copy of the SPL snippet but added a pipe (|) in front of stats to make syntax correct.) There is no join.  The above will not timeout even with millions of event.

In other words, what does "compare" mean in "to compare the same in another application", and what does the word mean in "to compare whether these clients ids are present in  another application"?

If you want to know which and how many sourcetypes (apps) each clientid appear in, all you need is to add the following:

| stats sum(count) as total values(sourcetype) as apps dc(sourcetype) as app_count by clientid

Put together,

(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph)
``` you can also use
index=a sourcetype IN (Cos, Ma, Ph)
```
| stats count by clientid, sourcetype
| stats sum(count) as total values(sourcetype) as apps dc(sourcetype) as app_count by clientid

Still no join.  Where do you get join to time out?

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...