All Apps and Add-ons

Would it be possible to use a regex for the wordlist in the Fuzzy Search for Splunk app?

jwalzerpitt
Influencer

In ES I am reviewing results from the "Concurrent Login Attempts Detected" correlation search which is as follows:

 

 

| datamodel "Identity_Management" High_Critical_Identities search 
| rename All_Identities.identity as "user" 
| fields user 
| eval cs_key='user' 
| join type=inner cs_key 
    [| tstats `summariesonly` count from datamodel=Authentication by _time,Authentication.app,Authentication.src,Authentication.user span=1s 
    | `drop_dm_object_name("Authentication")` 
    | eventstats dc(src) as src_count by app,user 
    | search src_count>1 
    | sort 0 + _time 
    | streamstats current=t window=2 earliest(_time) as previous_time,earliest(src) as previous_src by app,user 
    | where (src!=previous_src) 
    | eval time_diff=abs(_time-previous_time) 
    | where time_diff<300 
    | eval cs_key='user']

 

 

The issue is that I am seeing false positives for users who previous src is say "abc-xyz-01" and current src is "abc-xyz-02", basically systems with similar names (servers in clusters/pairs).

Would it be possible to use a regex for the wordlist in the Fuzzy Search for Splunk app and then filter out similar matches with a lower ratio?

 

Labels (1)
0 Karma

johnhuang
Motivator

If the naming convention or rules around identifying similar servers or server clusters can be defined, then you should normalize the server name using those rules. This will give you much better performance and accuracy.

Using the example you provided,  "abc-xyz-01" and  "abc-xyz-02",  the server name can be normalized to "abc-xyz". See highlighted in red below of how you may do this.


| datamodel "Identity_Management" High_Critical_Identities search
| rename All_Identities.identity as "user"
| fields user
| eval cs_key='user'
| join type=inner cs_key
[| tstats `summariesonly` count from datamodel=Authentication by _time,Authentication.app,Authentication.src,Authentication.user span=1s
| `drop_dm_object_name("Authentication")`
| rename src AS src_host
| rex field=src_host "(?<src>.*?)(\-\d+)?$"
| eventstats dc(src) as src_count by app,user
| search src_count>1
| sort 0 + _time
| streamstats current=t window=2 earliest(_time) as previous_time,earliest(src) as previous_src by app,user
| where (src!=previous_src)
| eval time_diff=abs(_time-previous_time)
| where time_diff<300
| eval cs_key='user']

jwalzerpitt
Influencer

TYVM for the reply.

On a different tack, I found this article - https://www.deductiv.net/blog/gettin-fuzzy-with-it - that mentioned this app - https://splunkbase.splunk.com/app/3626/ (JellyFisher is a Splunk custom search command that leverage the excellent jellyfish python's library to do approximate and phonetic strings matching), which I then added the line 

| jellyfisher levenshtein_distance(src,previous_src)

to the SPL that leads me to believe I can search for levenshtein_distance > 1 to eliminate similar src values as another way to tackle the issue of similar src names

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...

Index This | How many sevens are there between 1 and 100?

August 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...