Solved: How can I do a search on two fields in Fuzzy Searc...

faguilar · ‎03-22-2018

Hi,

I'm using the fuzzy search app for Splunk (https://splunkbase.splunk.com/app/3109/) on my splunk 6.3.1 instance and I need to make a similar match between names. I have this table:

name      blacklist_name
-------------------------
name1     blacklist_name1
name2     blacklist_name2
name3     blacklist_name3
name4     blacklist_name4
name5     blacklist_name5
...       ...

And I'm using this fuzzy search SPL:

...
| table name blacklist_name
| fuzzy wordlist=name type=simple compare_field=blacklist_name output_prefix="fuzz" delims="(\\\\)"
| table name target_name fuzz*

The thing is, using that query, the fuzzy search is comparing just by the string "name" instead of each of the values of that field (name1, name2 ... nameN).

Is there a way to tell the search that the "wordlist" are all the values of the "name" field?

Do you know another way of doing this match? I'm looking for similar name values' on my blacklist for all the values on the name field.

Thank you very much!

Regards

jlanders · ‎03-22-2018

Currently, this isn't supported. Basically, the 'wordlist' option is a comma separated list of static values provided in search command. Example:
| fuzzy wordlist="name1,name2,name3" compare_field=blacklist_name

This search would compare the provided list of values against each event's blacklist_name field. It's something I can add into the next release. Meanwhile, the URL Toolbox app (https://splunkbase.splunk.com/app/2734/) offers a ut_levenshtein() function which may get you where you need to be.

View solution in original post

jlanders · ‎03-22-2018

Currently, this isn't supported. Basically, the 'wordlist' option is a comma separated list of static values provided in search command. Example:
| fuzzy wordlist="name1,name2,name3" compare_field=blacklist_name

This search would compare the provided list of values against each event's blacklist_name field. It's something I can add into the next release. Meanwhile, the URL Toolbox app (https://splunkbase.splunk.com/app/2734/) offers a ut_levenshtein() function which may get you where you need to be.

jlanders · ‎03-22-2018

As of now, the requested functionality has been added/checked in to the git repo here: https://gitlab.com/johnfromthefuture/TA-fuzzy/.

I have a few more things I'd like to test/look at and then I'll get it on Splunkbase. If you are running on-prem, you can clone out the git repo and see if that meets your requirements.

faguilar · ‎03-23-2018

Btw John, is there an efficient way to check

name1 -> blacklist_name1,blacklist_name2...blacklist_nameN,
name2 -> blacklist_name1,blacklist_name2...blacklist_nameN,
...
nameN -> blacklist_name1,blacklist_name2...blacklist_nameN,

and so on? That's what I really need, to check each one of the names' values in the name field with all the names in the blacklist_name field, one at a time
Thanks!

jlanders · ‎03-23-2018

This app was primarily designed for passing in a wordlist like: "iexplore.exe,svchost.exe" and comparing that against events (say, process audit security logs) to find results that are <100 and greater than some number... highlighting processes like "svch0st.exe" or "scvhost.exe" - common malware hiding techniques.

That said, it's probably not the most efficient for your use case but you could accomplish a comparison like this:
your search | fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | fuzzy wordlist=name compare_field=blacklist_name

With the update I tossed in the git repo, that should read the values out of your name column and compare it to the blacklist_name column. With the makemv/mvexpand combo, you can compare every entry and get a corresponding score to find similar entries.

If you're looking for exact matches, then something like this might be better:
your search | fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | where match(name,blacklist_name)

I used the fields operator in those sample searches but that's not strictly required here. Hope that helps.

faguilar · ‎03-26-2018

Thank you very much John. This first approach using multivalue fields on the wordlist:

| fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | fuzzy wordlist=name compare_field=blacklist_name

Was the one I was using, but I have a lot of fields so the query is quite heavy, that's why I asked you about a more efficient way of doing this. But I understand better now how the fuzzy search works and thank you again for your answer, I'll check if maybe I can execute that query as a savedsearch so it will be scheduled during the idle hours of my Splunk instance, so the impact on the overall performance is smaller.

faguilar · ‎03-23-2018

Thank you very much John! I'll check it out on my Splunk instance and I'll tell you if it works as expected! 🙂

How can I do a search on two fields in Fuzzy Search for Splunk?

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

How can I do a search on two fields in Fuzzy Search for Splunk?

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?