Hi, how can i correlate events from different indexes when both( field names and values) are different ?
For example:
I have a some app logs in an index=id1 . There is field called user in this index which has values like:
SmithJ
JohnK
Now i want to find out what is the IP address of these users from our firewall index. But In the Firewall index, the user names are in following format:
Field Name: PanOSSourceUserName
Value: <Domain>\SmithJ
Field Name: PanOSSourceUserName
Value: <Domain>\JohnK
As you can see, the firewall index has names appended by our <domain name>\ while the app index doesn't have domain name in the user field. There are other fields called src_ip and Country in firewall events.
How can i craft a search that takes the user field from app index and compares/ correlates that with the PanOSSourceUserName field from Firewall index and accordingly displays the src_ip of the user . Hope i am clear.
End result: Table or Stats whatever works with following columns
_time | user | PanOSSourceUserName | src_ip | Country | |
Thanks in advance
Hi @neerajs_81,
to correlate different searches (it isn't relevant that they are in the same or different indexes) you need to have the same field names and the same value format.
In your case you can solve the first issue using the rename command and the second using the rex command.
Something like this:
index=idx1 OR index=firewall
| rex field=PanOSSourceUserName "\\(?<user>.*)"
| stats earliest(_time) AS _time values(src_ip) AS src_ip values(country) AS country values(PanOSSourceUserName) AS PanOSSourceUserName BY user
| table _time user PanOSSourceUserName src_ip country
Ciao.
Giuseppe
Thanks for responding, Your rex command is throwing the following error
Error in 'rex' command: Encountered the following error while compiling the regex '\(?<user>.*)': Regex: unmatched closing parenthesis.
Hi @neerajs_81,
it's the usual problem with regexes in Splunk, please try this:
index=idx1 OR index=firewall
| rex field=PanOSSourceUserName "\\\(?<user>.*)"
| stats earliest(_time) AS _time values(src_ip) AS src_ip values(country) AS country values(PanOSSourceUserName) AS PanOSSourceUserName BY user
| table _time user PanOSSourceUserName src_ip country
Ciao.
Giuseppe
Giuseppe, I see a minor issue. Because we are doing
index=idx1 OR index=firewall
the search results are only showing me events for users which are in Firewall index. I want it other way round. Basically
1. Check user in index=idx1.
2. Only for those users found above in idx1, get their corresponding src_ip from index=Firewall by matching against PanOSSourceName.
The rex command you gave is a good suggestion for comparing the user values between the two indexes. But i only want to see the results of folks found in idx1
Hope i am clear.
Hi @neerajs_81,
ok, alittle adjustement to the search: you have to filter eìresults taking only results in index idx1:
index=idx1 OR index=firewall
| rex field=PanOSSourceUserName "\\\(?<user>.*)"
| stats earliest(_time) AS _time values(src_ip) AS src_ip values(country) AS country values(PanOSSourceUserName) AS PanOSSourceUserName dc(index) AS dc_index BY user
| where dc_index=2
| table _time user PanOSSourceUserName src_ip country
Ciao.
Giuseppe
Still some adjustment required. This new search shows me results from both indexes meaning even those user records which are not in index= idx1 but there in the other index=firewall.
I have validated they are not in idx1 by running a "index=idx1" for that time range and it shows about 5 users.
But now when we run your query, the results shows more than 5 users. So somehow It is fetching some users from index=firewall which are not part of idx1 .
Will keep poking around. Awarding you karma points for your responses so far.
HI @neerajs_81,
the behavior you're reporting it's strange because the check on the indexes number (the where condition) assures that the user value is present in both indexes.
Only for debugging, run this search that displays the indexes in which the user value is present:
index=idx1 OR index=firewall
| rex field=PanOSSourceUserName "\\\(?<user2>.*)"
| eval user=coalesce(user,user2)
| stats earliest(_time) AS _time values(src_ip) AS src_ip values(country) AS country values(PanOSSourceUserName) AS PanOSSourceUserName dc(index) AS dc_index values(index) AS index BY user
| where dc_index=2
| table _time user PanOSSourceUserName src_ip country index
Ciao.
Giuseppe
Ok found the culprit, but not the solution. Problem is with the regex.
rex field=PanOSSourceUserName "\\\(?<user2>.*)"
The value of user2 field shows as \johnk . There is still the backslash \ that is showing in the value.
While the value of user1 field from idx1 is johnk
The PanOSSourceUserName value is domain\\johnk
Even if i put four backslashes in the rex
rex field=PanOSSourceUserName "\\\\(?<user2>.*)" , the value of user2 still shows as \johnk .
That is odd because on regex101.com it shows \\\\ is a 100% match for user2.
Any thoughts
Hi @neerajs_81,
please try this regex
^[^\\]+\\(?<user>.*)
or
^[^\\]+\\\(?<user>.*)
if it continues to not run, you can add, after the rex command and before the stats command:
| eval user=if(index="firewall",substr(user,2,50),user)
But the regex should be correct.
Ciao.
Giuseppe
Tried your updated search and it shows "No results found". I saw some posts where folks have used | join=INNER . Maybe a join would for this use case.
Checking.
Hi @neerajs_81,
it's alway more strange!
I use join only when there isn't any additional solution to an issue because it's a very slow and resource expensive command
anyway debug the data in this way:
run the following search
(index=idx1 user=*) OR (index=firewall PanOSSourceUserName =*)
| rex field=PanOSSourceUserName "\\\(?<user2>.*)"
| eval user=coalesce(user,user2)
| table _time user user2 PanOSSourceUserName index
the field "user" should have the 100% of occurrences.
in addition check if user2 is correctly extracted and the values correspond to the ones in user
If not check the regex, if yes, run the following search:
(index=idx1 user=*) OR (index=firewall PanOSSourceUserName=*)
| rex field=PanOSSourceUserName "\\\(?<user2>.*)"
| eval user=coalesce(user,user2)
| stats earliest(_time) AS _time values(src_ip) AS src_ip values(country) AS country values(PanOSSourceUserName) AS PanOSSourceUserName dc(index) AS dc_index values(index) AS index BY user
| where dc_index=2
| table _time user PanOSSourceUserName src_ip country index
in the index column, you should have always two values.
Ciao.
Giuseppe