Correlation Help Log

m92

Hello Splunkers, I'm new to Splunk and I'm stuck; I'm getting more data than I'm supposed to. Users are showing up when they shouldn't, and vice versa.
The purpose of the query is to determine which users are accessing the bastion with the tag=1 from the "index2" index. However, there's no information on
the users. That's why I'm fetching user data from the "index1" index by performing a join on the IP address.
The ultimate goal is to display the results in the following format: Users - IP - _time. It's important to note that IP addresses are dynamic.

When I run this command, it returns 1000 lines: `index="index2" tag=1 | table srcip, _time`

However, when I run this command, I get a lot more (11000), even though I'm supposed to have the same number since I'm just fetching users from the other
index, but I'm not supposed to have any additional lines:

index="index1" | search Users =* AND IP=*
| fields Users, IP, _time
| where NOT match(Users, "^AAA-[0-9]{5}\$")
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=IP
| table Users, ip, _time
| join type=inner ip
[ search index="index2" tag=1 | fields srcip, _time | eval ip=srcip | table ip, _time]
| table Users, ip, _time

Does anyone have a solution?

m92

Would anyone know how to do it?

gcusello

Hi @m92,

Spunk isn't a database, so avoid to use the join command because it's a very slow command,

in addition you divided your search in three levels adding more slowness,

so try to correlate events using stats BY the correlation key, something like this (to adapt to your use case):

(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
     values(Users) AS Users
     earliest(_time) AS earliest
     latest(_time) AS latest
     ip
| eval 
     earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
     latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, earliest latest

Ciao.

Giuseppe

m92

I have results, but the problem is that it displays users who don't have an IP address (so it shows users from index1 even if no match was found in index2). What I would like is for it to fetch and display only users if the IP addresses match correctly at the right time. Furthermore, I always have more lines (3000 versus 1000).

gcusello

Hi @m92 ,

if you want only IPs present in both indexes, you could use this search:

(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
     dc(index) AS index_count
     values(Users) AS Users
     earliest(_time) AS earliest
     latest(_time) AS latest
     BY ip
| where index_count>1
| eval 
     earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
     latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, earliest latest

Ciao.

Giuseppe

m92

It only displays users with their IP addresses, but the problem is that I still have a lot more lines than with this command:

index="index2" tag=1 | table srcip, _time

(8000 lines versus 1000)
So, I think either it's not filtering enough or it's adding users who aren't supposed to be there.
How can I handle this?

gcusello

Hi @m92,

with the table command you have many events with the same srcip and dfferent _time.

Do you want different lines if you have different _time?

if yes, you can add _time to the BY clause

(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| regex Users!="^AAA-[0-9]{5}\$"
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
     dc(index) AS index_count
     values(Users) AS Users
     BY ip _time
| where index_count>1
| table Users, ip, _time

even if, in this way you could have different _time in the two indexes so it will be difficoult to group by _time.

Ciao.

Giuseppe

m92

I tried the command you gave me, but nothing is displayed when adding _time in the BY.
Additionally, I added other data, but I would like to display one user per line rather than grouping multiple users together because they share the same IP address. For instance, on a certain IP address, multiple services were used, but I don't know which service was used.
So, if we display one user per line, I think it will be unnecessary to use earliest and latest and just display the correct _time, right?

(index="index1" Users =* IP=*) OR (index="index2" tag=1 )
| where NOT match(Users, "^AAA-[0-9]{5}\$")
| eval IP=if(match(IP, "^::ffff:"), replace(IP, "^::ffff:(\d+\.\d+\.\d+\.\d+)$", "\1"), IP)
| eval ip=coalesce(IP,srcip)
| stats
dc(index) AS index_count
values(Users) AS Users
values(destip) AS destip
values(service) AS service
earliest(_time) AS earliest
latest(_time) AS latest
BY ip
| where index_count>1
| eval
earliest=strftime(earliest,"%Y-%m-%d %H:%M:%S"),
latest=strftime(latest,"%Y-%m-%d %H:%M:%S")
| table Users, ip, dest_ip, service, earliest, latest

gcusello

Hi @m92 ,

I added eariest and latest because you have _time in your searches, but you can ignore them.

Ciao.

Giuseppe

Correlation Help Log

Linux

Windows

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

Everything Community at .conf24!

Index This | I’m short for "configuration file.” What am I?