In ES I am reviewing results from the "Concurrent Login Attempts Detected" correlation search which is as follows:
| datamodel "Identity_Management" High_Critical_Identities search
| rename All_Identities.identity as "user"
| fields user
| eval cs_key='user'
| join type=inner cs_key
[| tstats `summariesonly` count from datamodel=Authentication by _time,Authentication.app,Authentication.src,Authentication.user span=1s
| `drop_dm_object_name("Authentication")`
| eventstats dc(src) as src_count by app,user
| search src_count>1
| sort 0 + _time
| streamstats current=t window=2 earliest(_time) as previous_time,earliest(src) as previous_src by app,user
| where (src!=previous_src)
| eval time_diff=abs(_time-previous_time)
| where time_diff<300
| eval cs_key='user']
The issue is that I am seeing false positives for users who previous src is say "abc-xyz-01" and current src is "abc-xyz-02", basically systems with similar names (servers in clusters/pairs).
Would it be possible to use a regex for the wordlist in the Fuzzy Search for Splunk app and then filter out similar matches with a lower ratio?
If the naming convention or rules around identifying similar servers or server clusters can be defined, then you should normalize the server name using those rules. This will give you much better performance and accuracy.
Using the example you provided, "abc-xyz-01" and "abc-xyz-02", the server name can be normalized to "abc-xyz". See highlighted in red below of how you may do this.
| datamodel "Identity_Management" High_Critical_Identities search
| rename All_Identities.identity as "user"
| fields user
| eval cs_key='user'
| join type=inner cs_key
[| tstats `summariesonly` count from datamodel=Authentication by _time,Authentication.app,Authentication.src,Authentication.user span=1s
| `drop_dm_object_name("Authentication")`
| rename src AS src_host
| rex field=src_host "(?<src>.*?)(\-\d+)?$"
| eventstats dc(src) as src_count by app,user
| search src_count>1
| sort 0 + _time
| streamstats current=t window=2 earliest(_time) as previous_time,earliest(src) as previous_src by app,user
| where (src!=previous_src)
| eval time_diff=abs(_time-previous_time)
| where time_diff<300
| eval cs_key='user']
TYVM for the reply.
On a different tack, I found this article - https://www.deductiv.net/blog/gettin-fuzzy-with-it - that mentioned this app - https://splunkbase.splunk.com/app/3626/ (JellyFisher is a Splunk custom search command that leverage the excellent jellyfish python's library to do approximate and phonetic strings matching), which I then added the line
| jellyfisher levenshtein_distance(src,previous_src)
to the SPL that leads me to believe I can search for levenshtein_distance > 1 to eliminate similar src values as another way to tackle the issue of similar src names