Hi,
I have requirement as below, please could you review and suggest ?
Need to pick up all client ids from application log called "Cos" (index=a sourcetype=Cos ) where distinct client ids are in 6Millions. And, I want to compare whether these clients ids are present in another application log called "Ma" (index=a sourcetype=Ma).
And, I also want to compare the same in another application called "Ph" (index=a sourcetype=Ph)
Basically trying to get the count/volume based on the client id, which is common among the 3 application (Cos, Ma,Ph). The total events are in Millions and when i use join, the search job is getting auto-cancelled or getting terminated.
(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph) stats count by clientid, sourcetype
Thanks,
Selvam.
Hi @selvam_sekar ,
to identify the common clientids between the threee sourcetypes, you should run something like this:
index=a sourcetype IN ("Cos","Ma","Ph")
| stats
count
dc(sourcetype) AS sourcetype_count
BY clientid
| where sourcetype_count=3
| fields - sourcetype_count
Ciao.
Giuseppe
Hi @selvam_sekar ,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
Hi @selvam_sekar ,
to identify the common clientids between the threee sourcetypes, you should run something like this:
index=a sourcetype IN ("Cos","Ma","Ph")
| stats
count
dc(sourcetype) AS sourcetype_count
BY clientid
| where sourcetype_count=3
| fields - sourcetype_count
Ciao.
Giuseppe
As you already experience, Splunk strongly disfavors join. This is just natural as most noSQL do.
So, you explained how many events these sources can give, and how many different client ID's. What you forget to tell us is what you mean by "to get the count/volume based on the client id". If you only want to count events from each sourcetype by clientid, all you need to do is
(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph)
``` you can also use
index=a sourcetype IN (Cos, Ma, Ph)
```
| stats count by clientid, sourcetype
(which is a copy of the SPL snippet but added a pipe (|) in front of stats to make syntax correct.) There is no join. The above will not timeout even with millions of event.
In other words, what does "compare" mean in "to compare the same in another application", and what does the word mean in "to compare whether these clients ids are present in another application"?
If you want to know which and how many sourcetypes (apps) each clientid appear in, all you need is to add the following:
| stats sum(count) as total values(sourcetype) as apps dc(sourcetype) as app_count by clientid
Put together,
(index=a sourcetype=Cos) OR (index=a sourcetype=Ma) OR (index=a sourcetype=Ph)
``` you can also use
index=a sourcetype IN (Cos, Ma, Ph)
```
| stats count by clientid, sourcetype
| stats sum(count) as total values(sourcetype) as apps dc(sourcetype) as app_count by clientid
Still no join. Where do you get join to time out?