Solved: Show events if not seen in lookup

jacvbtaylor · ‎07-18-2024

I wrote this query to help look for multiple Autonomous System Number (ASN) values and multiple user agent values in a user’s Okta session as this is an indication of a session hijack.

I have created this search which works as needed

index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| search "ASN Count" > 1 AND "Agent Count" > 1

Session ID ASN ASN Count User Agent Agent Count User Risk

idxxxxxxxxxxxx	12345 321	2	UserAgent1 UserAgent2	2	user@company.com	{reasons=Anomalous Device, level=MEDIUM}
idxxxxxxxxxxxx	6789 321	2	UserAgent1 UserAgent2	2	user@company.com	{reasons=Anomalous Device, level=MEDIUM}

The issue is I am not getting only anomalous activity as expected, but many false positives as most session IDs legitimately have more than one ASN attached to the session.

My thought was to create a lookup (asn_user.csv) that will eventually be updated through a scheduled search (at a slower rate that the main search is ran) to append new data to gather the User and ASNs that have had a successful transaction with using this search:

index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "User"='actor.alternateId'| table ASN User
| dedup ASN User

ASN User

12345	user@company.com
321	user@company.com

My issue right now is trying to use the lookup against the main search.

The goal is IF the ASN is new to the user from the main okta search (meaning the ASN is not seen in the lookup file, asn_user.csv) then return the | table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk | search "ASN Count" > 1 AND "Agent Count" > 1 results with the anomalous ASN while still meeting the "ASN Count" > 1 AND "Agent Count" > 1 requirement

Does anyone have some ideas to accomplish this?

yuanliu · ‎07-18-2024

@bowesmana is correct. mvfind won't accept two variables. Also as he says, single quote should be used to represent value in where command. This is an alternative solution:

``` the above emulates
index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| lookup asn_user.csv User output ASN as ASNfound
| where 'ASN Count' > 1 AND 'Agent Count' > 1 AND mvmap(ASN, if(ASN == ASNfound, "yes", "no")) == "no"

Here is an emulation of your illustrated data after lookup:

| makeresults format=csv data="Session ID, ASN, ASN Count, User Agent, Agent Count, User, Risk, ASNfound
idxxxxxxxxxxxx	,\"12345
321\",2	,\"UserAgent1
UserAgent2\",2,	user@company.com,	\"{reasons=Anomalous Device, level=MEDIUM}\", \"12345
321\"
idxxxxxxxxxxxx,	\"6789
321\",2,	\"UserAgent1
UserAgent2\",2,	user@company.com,	\"{reasons=Anomalous Device, level=MEDIUM}\", \"12345
321\""
``` the above emulates
index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| lookup asn_user.csv User output ASN as ASNfound
```
| where 'ASN Count' > 1 AND 'Agent Count' > 1 AND mvmap(ASN, if(ASN == ASNfound, "yes", "no")) == "no"

(The above uses a side effect of SPL's equality operator.) It gives

ASN	ASN Count	ASNfound	Agent Count	Risk	Session ID	User	User Agent
6789 321	2	12345 321	2	{reasons=Anomalous Device, level=MEDIUM}	idxxxxxxxxxxxx	user@company.com	UserAgent1 UserAgent2

Play with it and compare with real data.

View solution in original post

yuanliu · ‎07-18-2024

Assuming you have lookup file asn_user.csv and you set up a lookup called asn_user.csv (my preference is to not use .csv in lookup name, but many others do not make the distinction), you can do

index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| lookup asn_user.csv User output ASN as ASNfound
| where "ASN Count" > 1 AND "Agent Count" > 1 AND NOT mvfind(ASNfound, ASN)

bowesmana · ‎07-18-2024

@yuanliu mvfind() will not work with two potentially MV fields.

yuanliu · ‎07-18-2024

@bowesmana is correct. mvfind won't accept two variables. Also as he says, single quote should be used to represent value in where command. This is an alternative solution:

``` the above emulates
index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| lookup asn_user.csv User output ASN as ASNfound
| where 'ASN Count' > 1 AND 'Agent Count' > 1 AND mvmap(ASN, if(ASN == ASNfound, "yes", "no")) == "no"

Here is an emulation of your illustrated data after lookup:

| makeresults format=csv data="Session ID, ASN, ASN Count, User Agent, Agent Count, User, Risk, ASNfound
idxxxxxxxxxxxx	,\"12345
321\",2	,\"UserAgent1
UserAgent2\",2,	user@company.com,	\"{reasons=Anomalous Device, level=MEDIUM}\", \"12345
321\"
idxxxxxxxxxxxx,	\"6789
321\",2,	\"UserAgent1
UserAgent2\",2,	user@company.com,	\"{reasons=Anomalous Device, level=MEDIUM}\", \"12345
321\""
``` the above emulates
index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| eval "ASN"='securityContext.asNumber'
| eval "Session ID"='authenticationContext.externalSessionId'
| eval "User"='actor.alternateId' | eval "Risk"='debugContext.debugData.risk'
| stats dc("user_agent") as "Agent Count" values(user_agent) AS "User Agent" dc(ASN) as "ASN Count" values(ASN) as ASN dc(Risk) as "Risk Count" values(Risk) as Risk by User "Session ID"
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk
| lookup asn_user.csv User output ASN as ASNfound
```
| where 'ASN Count' > 1 AND 'Agent Count' > 1 AND mvmap(ASN, if(ASN == ASNfound, "yes", "no")) == "no"

(The above uses a side effect of SPL's equality operator.) It gives

ASN	ASN Count	ASNfound	Agent Count	Risk	Session ID	User	User Agent
6789 321	2	12345 321	2	{reasons=Anomalous Device, level=MEDIUM}	idxxxxxxxxxxxx	user@company.com	UserAgent1 UserAgent2

Play with it and compare with real data.

bowesmana · ‎07-18-2024

I refactored your original search slightly using rename rather than eval, as there's no need to duplicate the fields.

I also removed the dc() from the stats as it's not necessary - it can be done later because you are collecting values and is probably more optimal.

index="okta" actor.alternateId=*@* authenticationContext.externalSessionId!="unknown"
| rename "securityContext.asNumber" as ASN, 
         "authenticationContext.externalSessionId" as "Session ID",
         "actor.alternateId" as User
         "debugContext.debugData.risk" as Risk
| stats values(user_agent) AS "User Agent" values(ASN) as ASN values(Risk) as Risk by User "Session ID"
| lookup asn_user.csv ASN User OUTPUT ASN as found_ASNs
``` Count the ASN's and Agents here and count the number of ASNs found
    in the lookup and set a new field if there is a new ASN found not
    seen before ```
| eval "ASN Count"=mvcount(ASN), "Agent Count"=mvcount('User Agent'), 
hasNewASN=if('ASN Count'>mvcount(found_ASNs), 1, 0)
``` Do the where before the table ```
| where 'ASN Count' > 1 AND 'Agent Count' > 1 AND hasNewASN=1
| table "Session ID", ASN, "ASN Count", "User Agent", "Agent Count", User, Risk

The lookup will lookup ALL the ASNs collected and return all those that are found, so to see if there is a new one, you can just compare the counts.

Note that if there are more than 100 ASNs then a CSV will only return 100 - you can make a definition that will allow up to 1000, but if that is going to be an issue, then you will need to put the lookup BEFORE the stats and test for the found ASN and set a flag accordingly, e.g.

...
| lookup asn_user.csv ASN User OUTPUT ASN as found_ASN
| eval newASN=if(isnull(found_AS),1,0)
| stats sum(newASN) as newASNs values(user_agent) AS "User Agent" values(ASN) as ASN values(Risk) as Risk by User "Session ID"
...

so you can then count the new ASNs for that user, but if you can do it after the stats, it will be more performant.

Note field quoting rules.

Field names need to be DOUBLE quoted when containing spaces or other odd characters when on left hand side of eval
Field names need to be SINGLE quoted when containing spaces or other odd characters when on right hand side of eval
Field names need to beDOUBLE quoted when containing spaces or other odd characters when in a stats aggregation

jacvbtaylor · ‎07-19-2024

Thank you so much! This worked beautifully. I have been trying to wrap my head around this for such a long time, it's so nice to see an outcome.

Really appreciate your help

Show events if not seen in lookup

lookup

Announcing the Expansion of the Splunk Academic Alliance Program

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Buttercup Games: Further Dashboarding Techniques (Part 7)