Hi,
I'm using the fuzzy search app for Splunk (https://splunkbase.splunk.com/app/3109/) on my splunk 6.3.1 instance and I need to make a similar match between names. I have this table:
name blacklist_name
-------------------------
name1 blacklist_name1
name2 blacklist_name2
name3 blacklist_name3
name4 blacklist_name4
name5 blacklist_name5
... ...
And I'm using this fuzzy search SPL:
...
| table name blacklist_name
| fuzzy wordlist=name type=simple compare_field=blacklist_name output_prefix="fuzz" delims="(\\\\)"
| table name target_name fuzz*
The thing is, using that query, the fuzzy search is comparing just by the string "name" instead of each of the values of that field (name1, name2 ... nameN).
Is there a way to tell the search that the "wordlist" are all the values of the "name" field?
Do you know another way of doing this match? I'm looking for similar name values' on my blacklist for all the values on the name field.
Thank you very much!
Regards
Currently, this isn't supported. Basically, the 'wordlist' option is a comma separated list of static values provided in search command. Example:
| fuzzy wordlist="name1,name2,name3" compare_field=blacklist_name
This search would compare the provided list of values against each event's blacklist_name field. It's something I can add into the next release. Meanwhile, the URL Toolbox app (https://splunkbase.splunk.com/app/2734/) offers a ut_levenshtein() function which may get you where you need to be.
Currently, this isn't supported. Basically, the 'wordlist' option is a comma separated list of static values provided in search command. Example:
| fuzzy wordlist="name1,name2,name3" compare_field=blacklist_name
This search would compare the provided list of values against each event's blacklist_name field. It's something I can add into the next release. Meanwhile, the URL Toolbox app (https://splunkbase.splunk.com/app/2734/) offers a ut_levenshtein() function which may get you where you need to be.
As of now, the requested functionality has been added/checked in to the git repo here: https://gitlab.com/johnfromthefuture/TA-fuzzy/.
I have a few more things I'd like to test/look at and then I'll get it on Splunkbase. If you are running on-prem, you can clone out the git repo and see if that meets your requirements.
Btw John, is there an efficient way to check
name1 -> blacklist_name1,blacklist_name2...blacklist_nameN,
name2 -> blacklist_name1,blacklist_name2...blacklist_nameN,
...
nameN -> blacklist_name1,blacklist_name2...blacklist_nameN,
and so on? That's what I really need, to check each one of the names' values in the name field with all the names in the blacklist_name field, one at a time
Thanks!
This app was primarily designed for passing in a wordlist like: "iexplore.exe,svchost.exe" and comparing that against events (say, process audit security logs) to find results that are <100 and greater than some number... highlighting processes like "svch0st.exe" or "scvhost.exe" - common malware hiding techniques.
That said, it's probably not the most efficient for your use case but you could accomplish a comparison like this:
your search | fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | fuzzy wordlist=name compare_field=blacklist_name
With the update I tossed in the git repo, that should read the values out of your name column and compare it to the blacklist_name column. With the makemv/mvexpand combo, you can compare every entry and get a corresponding score to find similar entries.
If you're looking for exact matches, then something like this might be better:
your search | fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | where match(name,blacklist_name)
I used the fields operator in those sample searches but that's not strictly required here. Hope that helps.
Thank you very much John. This first approach using multivalue fields on the wordlist:
| fields name,blacklist_name | makemv delim="," blacklist_name | mvexpand blacklist_name | fuzzy wordlist=name compare_field=blacklist_name
Was the one I was using, but I have a lot of fields so the query is quite heavy, that's why I asked you about a more efficient way of doing this. But I understand better now how the fuzzy search works and thank you again for your answer, I'll check if maybe I can execute that query as a savedsearch so it will be scheduled during the idle hours of my Splunk instance, so the impact on the overall performance is smaller.
Thank you very much John! I'll check it out on my Splunk instance and I'll tell you if it works as expected! 🙂