single column join is working
index=* source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17* host IN (*.test.com)
| rename columns.pid AS pid, columns.pid_ts as pid_ts, columns.path as path,
| dedup host path pid
| join pid type=left max=1 [search index=* source=process host IN (*.test.com) earliest=-25h latest=now
| rename columns.pid AS pid, columns.cmdline as cmd, columns.username as user, columns.uid as uid, columns.groupname as group, columns.gid as gid
| dedup host pid]
| table host, path, pid, user, uid, group, gid, cmd
but multi column join is not working
index=* source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17* host IN (*.test.com)
| rename columns.pid AS pid, columns.pid_ts as pid_ts, columns.path as path,
| dedup host path pid
| join host,pid type=left max=1 [search index=* source=process host IN (*.test.com) earliest=-25h latest=now
| rename columns.pid AS pid, columns.cmdline as cmd, columns.username as user, columns.uid as uid, columns.groupname as group, columns.gid as gid
| dedup host pid]
| table host, path, pid, user, uid, group, gid, cmd
Version:8.2.6Build:a6fe1ee8894b
Hi @pmittal,
as you surely read in many posts on this community, Splunk isn't a DB and the usual approach thet people that worked on a DB (like me eleven years ago!) are usually using is wrong in Splunk:
I'm sure that your search is very slow, because the join command can be used only if there in't any other solution because it's very slow and it has the limit of 50,000 results in the subsearch.
For this reason, I hint to see a different approach based on stats, something like this:
index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename
columns.pid AS pid
columns.pid_ts as pid_ts
columns.path as path
columns.cmdline as cmd
columns.username as user
olumns.uid as uid
columns.groupname as group
columns.gid as gid
| stats
values(host) AS host
values(path) AS path
values(user) AS user
values(uid) AS uid
values(group) AS group
values(gid) AS gid
values(cmd) AS cmd
BY pid
| table host path pid user uid group gid cmdIn few words, you have to put in the main search both the searches and make a stats command by the common keys, adding as values the other fields you want.
About the multi field join, maybe in your search the problem is the comma between the join fields.
I suppose that you already checked that there are values matching the join.
Anyway, also in this case the solution could be:
index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename
columns.pid AS pid
columns.pid_ts as pid_ts
columns.path as path
columns.cmdline as cmd
columns.username as user
olumns.uid as uid
columns.groupname as group
columns.gid as gid
| stats
values(path) AS path
values(user) AS user
values(uid) AS uid
values(group) AS group
values(gid) AS gid
values(cmd) AS cmd
BY host pid
| table host path pid user uid group gid cmdCiao.
Giuseppe
Hi @gcusello ,
The problem seems to be due to no. of records greater than 50k as you pointed out. However, comma between the join fields is not an issue. There is one problem with your solution, it's not providing left join functionality. It just combine records from both the indexes. E.g. final output should have only 4 records but what I am getting is like 70k records. After stats, I tried filtering data using where but no success. What I want is only path containing vulnerable log4j i.e. the path filter used in first query. Sorry, I am new to Splunk. So, this might be a very basic ask
Hi @pmittal,
please try this:
index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename
columns.pid AS pid
columns.pid_ts as pid_ts
columns.path as path
columns.cmdline as cmd
columns.username as user
olumns.uid as uid
columns.groupname as group
columns.gid as gid
| stats
values(host) AS host
values(path) AS path
values(user) AS user
values(uid) AS uid
values(group) AS group
values(gid) AS gid
values(cmd) AS cmd
dc(source) AS source_count
BY pid
| where source_count=2
| table host path pid user uid group gid cmdCiao.
Giuseppe
Since, I required regex; I used
| search path IN (*/log4j-core*) NOT path=*/log4j*2.17*insetad of where command. Thank you for your support!!
Hi @pmittal,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated 😉
got it working with search command instead of where. However, still not getting expected output but on right track. Will update in sometime
Hi @pmittal,
as you surely read in many posts on this community, Splunk isn't a DB and the usual approach thet people that worked on a DB (like me eleven years ago!) are usually using is wrong in Splunk:
I'm sure that your search is very slow, because the join command can be used only if there in't any other solution because it's very slow and it has the limit of 50,000 results in the subsearch.
For this reason, I hint to see a different approach based on stats, something like this:
index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename
columns.pid AS pid
columns.pid_ts as pid_ts
columns.path as path
columns.cmdline as cmd
columns.username as user
olumns.uid as uid
columns.groupname as group
columns.gid as gid
| stats
values(host) AS host
values(path) AS path
values(user) AS user
values(uid) AS uid
values(group) AS group
values(gid) AS gid
values(cmd) AS cmd
BY pid
| table host path pid user uid group gid cmdIn few words, you have to put in the main search both the searches and make a stats command by the common keys, adding as values the other fields you want.
About the multi field join, maybe in your search the problem is the comma between the join fields.
I suppose that you already checked that there are values matching the join.
Anyway, also in this case the solution could be:
index=* ((source=jar columns.path="*/log4j-core*" NOT columns.path=*/log4j*2.17*) OR (source=process earliest=-25h latest=now)) host IN (*.test.com)
| rename
columns.pid AS pid
columns.pid_ts as pid_ts
columns.path as path
columns.cmdline as cmd
columns.username as user
olumns.uid as uid
columns.groupname as group
columns.gid as gid
| stats
values(path) AS path
values(user) AS user
values(uid) AS uid
values(group) AS group
values(gid) AS gid
values(cmd) AS cmd
BY host pid
| table host path pid user uid group gid cmdCiao.
Giuseppe