Splunk Search

Why is multi column join not working whereas single column join is working fine?

pmittal
Engager

single column join is working

 

 

index=* source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17* host IN (*.test.com)

| rename columns.pid AS pid, columns.pid_ts as pid_ts, columns.path as path,

| dedup host path pid

| join pid type=left max=1 [search index=* source=process host IN (*.test.com) earliest=-25h latest=now

  | rename columns.pid AS pid, columns.cmdline as cmd, columns.username as user, columns.uid as uid, columns.groupname as group, columns.gid as gid

  | dedup host pid]

| table host, path, pid, user, uid, group, gid, cmd

 

 

but multi column join is not working

 

 

index=* source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17* host IN (*.test.com)

| rename columns.pid AS pid, columns.pid_ts as pid_ts, columns.path as path,

| dedup host path pid

| join host,pid type=left max=1 [search index=* source=process host IN (*.test.com) earliest=-25h latest=now

  | rename columns.pid AS pid, columns.cmdline as cmd, columns.username as user, columns.uid as uid, columns.groupname as group, columns.gid as gid

  | dedup host pid]

| table host, path, pid, user, uid, group, gid, cmd

 

 

 

Splunk Enterprise

Version:8.2.6Build:a6fe1ee8894b

Labels (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @pmittal,

as you surely read in many posts on this community, Splunk isn't a DB and the usual approach thet people that worked on a DB (like me eleven years ago!) are usually using is wrong in Splunk:

I'm sure that your search is very slow, because the join command can be used only if there in't any other solution because it's very slow and it has the limit of 50,000 results in the subsearch.

For this reason, I hint to see a different approach based on stats, something like this:

index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename 
   columns.pid AS pid
   columns.pid_ts as pid_ts
   columns.path as path
   columns.cmdline as cmd
   columns.username as user
   olumns.uid as uid
   columns.groupname as group
   columns.gid as gid
| stats 
   values(host) AS host 
   values(path) AS path 
   values(user) AS user
   values(uid) AS uid
   values(group) AS group
   values(gid) AS gid
   values(cmd) AS cmd
   BY pid
| table host path pid user uid group gid cmd

In few words, you have to put in the main search both the searches and make a stats command by the common keys, adding as values the other fields you want.

About the multi field join, maybe in your search the problem is the comma between the join fields.

I suppose that you already checked that there are values matching the join.

Anyway, also in this case the solution could be:

index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename 
   columns.pid AS pid
   columns.pid_ts as pid_ts
   columns.path as path
   columns.cmdline as cmd
   columns.username as user
   olumns.uid as uid
   columns.groupname as group
   columns.gid as gid
| stats 
   values(path) AS path 
   values(user) AS user
   values(uid) AS uid
   values(group) AS group
   values(gid) AS gid
   values(cmd) AS cmd
   BY host pid
| table host path pid user uid group gid cmd

Ciao.

Giuseppe

View solution in original post

0 Karma

pmittal
Engager

Hi @gcusello ,

The problem seems to be due to no. of records greater than 50k as you pointed out. However, comma between the join fields is not an issue.  There is one problem with your solution, it's not providing left join functionality. It just combine records from both the indexes. E.g. final output should have only 4 records but what I am getting is like 70k records. After stats, I tried filtering data using where but no success. What I want is only path containing vulnerable log4j i.e. the path filter used in first query. Sorry, I am new to Splunk. So, this might be a very basic ask

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @pmittal,

please try this:

index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename 
   columns.pid AS pid
   columns.pid_ts as pid_ts
   columns.path as path
   columns.cmdline as cmd
   columns.username as user
   olumns.uid as uid
   columns.groupname as group
   columns.gid as gid
| stats 
   values(host) AS host 
   values(path) AS path 
   values(user) AS user
   values(uid) AS uid
   values(group) AS group
   values(gid) AS gid
   values(cmd) AS cmd
   dc(source) AS source_count
   BY pid
| where source_count=2
| table host path pid user uid group gid cmd

Ciao.

Giuseppe

0 Karma

pmittal
Engager

Since, I required regex; I used 

| search path IN (*/log4j-core*) NOT path=*/log4j*2.17*

 insetad of where command.  Thank you for your support!!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @pmittal,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated 😉

0 Karma

pmittal
Engager

got it working with search command instead of where. However, still not getting expected output but on right track. Will update in sometime

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @pmittal,

as you surely read in many posts on this community, Splunk isn't a DB and the usual approach thet people that worked on a DB (like me eleven years ago!) are usually using is wrong in Splunk:

I'm sure that your search is very slow, because the join command can be used only if there in't any other solution because it's very slow and it has the limit of 50,000 results in the subsearch.

For this reason, I hint to see a different approach based on stats, something like this:

index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename 
   columns.pid AS pid
   columns.pid_ts as pid_ts
   columns.path as path
   columns.cmdline as cmd
   columns.username as user
   olumns.uid as uid
   columns.groupname as group
   columns.gid as gid
| stats 
   values(host) AS host 
   values(path) AS path 
   values(user) AS user
   values(uid) AS uid
   values(group) AS group
   values(gid) AS gid
   values(cmd) AS cmd
   BY pid
| table host path pid user uid group gid cmd

In few words, you have to put in the main search both the searches and make a stats command by the common keys, adding as values the other fields you want.

About the multi field join, maybe in your search the problem is the comma between the join fields.

I suppose that you already checked that there are values matching the join.

Anyway, also in this case the solution could be:

index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename 
   columns.pid AS pid
   columns.pid_ts as pid_ts
   columns.path as path
   columns.cmdline as cmd
   columns.username as user
   olumns.uid as uid
   columns.groupname as group
   columns.gid as gid
| stats 
   values(path) AS path 
   values(user) AS user
   values(uid) AS uid
   values(group) AS group
   values(gid) AS gid
   values(cmd) AS cmd
   BY host pid
| table host path pid user uid group gid cmd

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2

🗣 You Spoke, We Listened Audit Trail v2 wasn’t written in isolation—it was shaped by your voices. In ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

 Prepare to elevate your security operations with the powerful upgrade to Splunk Enterprise Security 8.x! This ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Passionate about security automation? Apply now to our AI Playbook Authoring Alpha private preview ...