Does anyone know of a way to detect deviations in the spelling of a value? For example, for the value domain="google.com", if a value of "go0gle.com" or g00ogl3.com is returned, output results, alert, etc.
Another use case would be if sender="user@domain.com". and a value of "user@d0maine.com" or user@domaiin3.com was returned.
Bonus question: If anyone knows of a way to detect cAsE ObFusCaTiOn that would be great too.
There is an app in Splunkbase which supports Levenshtein distance, Damerau-Levenshtein_distance, Jaro distance, Jaro winkler, match rating comparison, and Hamming distance comparisons, plus a number of phonetic algorithms, including soundex. It is called JellyFisher. Here is a sample Levenshtein distance evaluation using this app:
... | jellyfisher levensthein_distance(sourcetype,source)
What would be returned here is an integer, according to this description of Levenshtein distance.
Each of the JellyFisher functions returns the result in a field named after the function (i.e., levensthein_distance, damerau_levenshtein_distance, soundex).
Here is a link to the JellyFisher app.
I've mocked-up an example of using the Levenshtein distance function the app supports using your three sender examples. This won't run in Splunk unless the app is installed (it installs without restart, and it is quick to install).
| makeresults
| eval sender1="user@domain.com", sender2="user@d0maine.com", sender3="user@domaiin3.com"
| jellyfisher levenshtein_distance(sender1, sender2)
| rename levenshtein_distance AS sender1sender2diff
| jellyfisher levenshtein_distance(sender1, sender3)
| rename levenshtein_distance AS sender1sender3diff
| jellyfisher levenshtein_distance(sender2, sender3)
| rename levenshtein_distance AS sender2sender3diff
| table sender1 sender2 sender3 sender1sender2diff sender1sender3diff sender2sender3diff
There is an app in Splunkbase which supports Levenshtein distance, Damerau-Levenshtein_distance, Jaro distance, Jaro winkler, match rating comparison, and Hamming distance comparisons, plus a number of phonetic algorithms, including soundex. It is called JellyFisher. Here is a sample Levenshtein distance evaluation using this app:
... | jellyfisher levensthein_distance(sourcetype,source)
What would be returned here is an integer, according to this description of Levenshtein distance.
Each of the JellyFisher functions returns the result in a field named after the function (i.e., levensthein_distance, damerau_levenshtein_distance, soundex).
Here is a link to the JellyFisher app.
I've mocked-up an example of using the Levenshtein distance function the app supports using your three sender examples. This won't run in Splunk unless the app is installed (it installs without restart, and it is quick to install).
| makeresults
| eval sender1="user@domain.com", sender2="user@d0maine.com", sender3="user@domaiin3.com"
| jellyfisher levenshtein_distance(sender1, sender2)
| rename levenshtein_distance AS sender1sender2diff
| jellyfisher levenshtein_distance(sender1, sender3)
| rename levenshtein_distance AS sender1sender3diff
| jellyfisher levenshtein_distance(sender2, sender3)
| rename levenshtein_distance AS sender2sender3diff
| table sender1 sender2 sender3 sender1sender2diff sender1sender3diff sender2sender3diff
Thanks. I'll certainly give it a try.
Look into Levenshtein distance.