Would it be possible to use a regex for the wordli...

jwalzerpitt · ‎09-26-2022

In ES I am reviewing results from the "Concurrent Login Attempts Detected" correlation search which is as follows:

| datamodel "Identity_Management" High_Critical_Identities search 
| rename All_Identities.identity as "user" 
| fields user 
| eval cs_key='user' 
| join type=inner cs_key 
    [| tstats `summariesonly` count from datamodel=Authentication by _time,Authentication.app,Authentication.src,Authentication.user span=1s 
    | `drop_dm_object_name("Authentication")` 
    | eventstats dc(src) as src_count by app,user 
    | search src_count>1 
    | sort 0 + _time 
    | streamstats current=t window=2 earliest(_time) as previous_time,earliest(src) as previous_src by app,user 
    | where (src!=previous_src) 
    | eval time_diff=abs(_time-previous_time) 
    | where time_diff<300 
    | eval cs_key='user']

The issue is that I am seeing false positives for users who previous src is say "abc-xyz-01" and current src is "abc-xyz-02", basically systems with similar names (servers in clusters/pairs).

Would it be possible to use a regex for the wordlist in the Fuzzy Search for Splunk app and then filter out similar matches with a lower ratio?

johnhuang · ‎09-26-2022

If the naming convention or rules around identifying similar servers or server clusters can be defined, then you should normalize the server name using those rules. This will give you much better performance and accuracy.

Using the example you provided, "abc-xyz-01" and "abc-xyz-02", the server name can be normalized to "abc-xyz". See highlighted in red below of how you may do this.

jwalzerpitt · ‎09-27-2022

TYVM for the reply.

On a different tack, I found this article - https://www.deductiv.net/blog/gettin-fuzzy-with-it - that mentioned this app - https://splunkbase.splunk.com/app/3626/ (JellyFisher is a Splunk custom search command that leverage the excellent jellyfish python's library to do approximate and phonetic strings matching), which I then added the line

| jellyfisher levenshtein_distance(src,previous_src)

to the SPL that leads me to believe I can search for levenshtein_distance > 1 to eliminate similar src values as another way to tackle the issue of similar src names

Would it be possible to use a regex for the wordlist in the Fuzzy Search for Splunk app?

search

Can’t make it to .conf25? Join us online!

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Index This | How many sevens are there between 1 and 100?

Are you a member of the Splunk Community?