Hello Splunkers, I'm new to Splunk and I'm stuck; I'm getting more data than I'm supposed to. Users are showing up when they shouldn't, and vice versa.
The purpose of the query is to determine which users are accessing the bastion with the tag=1 from the "index2" index. However, there's no information on
the users. That's why I'm fetching user data from the "index1" index by performing a join on the IP address.
The ultimate goal is to display the results in the following format: Users - IP - _time. It's important to note that IP addresses are dynamic.
When I run this command, it returns 1000 lines: `index="index2" tag=1 | table srcip, _time`
However, when I run this command, I get a lot more (11000), even though I'm supposed to have the same number since I'm just fetching users from the other
index, but I'm not supposed to have any additional lines:
index="index1" | search Users =* AND IP=*
| fields Users, IP, _time
| where NOT match(Users, "^AAA-[0-9]{5}\$")
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=IP
| table Users, ip, _time
| join type=inner ip
[ search index="index2" tag=1 | fields srcip, _time | eval ip=srcip | table ip, _time]
| table Users, ip, _time
Does anyone have a solution?
Would anyone know how to do it?
Hi @m92,
Spunk isn't a database, so avoid to use the join command because it's a very slow command,
in addition you divided your search in three levels adding more slowness,
so try to correlate events using stats BY the correlation key, something like this (to adapt to your use case):
(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
values(Users) AS Users
earliest(_time) AS earliest
latest(_time) AS latest
ip
| eval
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, earliest latest
Ciao.
Giuseppe
I have results, but the problem is that it displays users who don't have an IP address (so it shows users from index1 even if no match was found in index2). What I would like is for it to fetch and display only users if the IP addresses match correctly at the right time. Furthermore, I always have more lines (3000 versus 1000).
Hi @m92 ,
if you want only IPs present in both indexes, you could use this search:
(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
dc(index) AS index_count
values(Users) AS Users
earliest(_time) AS earliest
latest(_time) AS latest
BY ip
| where index_count>1
| eval
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, earliest latest
Ciao.
Giuseppe
It only displays users with their IP addresses, but the problem is that I still have a lot more lines than with this command:
index="index2" tag=1 | table srcip, _time
(8000 lines versus 1000)
So, I think either it's not filtering enough or it's adding users who aren't supposed to be there.
How can I handle this?
Hi @m92,
with the table command you have many events with the same srcip and dfferent _time.
Do you want different lines if you have different _time?
if yes, you can add _time to the BY clause
(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
dc(index) AS index_count
values(Users) AS Users
BY ip _time
| where index_count>1
| table Users, ip, _time
even if, in this way you could have different _time in the two indexes so it will be difficoult to group by _time.
Ciao.
Giuseppe
I tried the command you gave me, but nothing is displayed when adding _time in the BY.
Additionally, I added other data, but I would like to display one user per line rather than grouping multiple users together because they share the same IP address. For instance, on a certain IP address, multiple services were used, but I don't know which service was used.
So, if we display one user per line, I think it will be unnecessary to use earliest and latest and just display the correct _time, right?
(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| where NOT match(Users, "^AAA-[0-9]{5}\$")
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
dc(index) AS index_count
values(Users) AS Users
values(destip) AS destip
values(service) AS service
earliest(_time) AS earliest
latest(_time) AS latest
BY ip
| where index_count>1
| eval
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, dest_ip, service, earliest, latest
Hi @m92 ,
I added eariest and latest because you have _time in your searches, but you can ignore them.
Ciao.
Giuseppe