Splunk Search

Can splunk compare two strings and return % likeness/similarity between the two?

Splunk Employee
Splunk Employee

For example, if i have a username of bsmith843 in a field returned by one search, and bsmiths845 as a field from another search, is there any way to gauge the similarity between the two strings? I know i can use wildcards/regex to try and match the strings, but if i can't match everyone i would like to know how similar they are..


And from even further in the future...

There is an app in Splunkbase which supports Levenshtein distance, Damerau-Levenshtein_distance, Jaro distance, Jaro winkler, match rating comparison, and Hamming distance comparisons, plus a number of phonetic algorithms, including soundex. It is called JellyFisher. Here is a sample Levenshtein distance evaluation using this app:

... | jellyfisher levensthein_distance(sourcetype,source)

What would be returned here is an integer, according to this description of Levenshtein distance.

Each of the JellyFisher functions returns the result in a field named after the function (i.e., levensthein_distance, damerau_levenshtein_distance, soundex).

Here is a link to the JellyFisher app.

Here is a mocked-up use of it:

| makeresults
| eval foo="kitten", bar="smitten" 
| jellyfisher levenshtein_distance(foo, bar) 
| table foo bar levenshtein_distance 

alt text

Super Champion

There is a python function that does something very close to this. It returns a number between 0 and 1 based on the similarity of two terms. You can find it in the difflib module.

Here is a really quick example of an app named "fieldcompare" which contains a single python search command. The app is made up of the following files:


import splunk.Intersplunk
import difflib

(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
args, kwargs = splunk.Intersplunk.getKeywordsAndOptions()

if isgetinfo:
    # streaming, generating, retevs, reqsop, preop
    splunk.Intersplunk.outputInfo(True, False, False, False, None)

(results, dummyresults, settings) = splunk.Intersplunk.getOrganizedResults()

field1_name = kwargs.get("field1", "field1")
field2_name = kwargs.get("field2", "field2")
output_field = kwargs.get("result", "ratio")

    for result in results:
            f1 = result[field1_name]
            f2 = result[field2_name]
        except KeyError:
            # If either field is missing, simply ignore

        sm = difflib.SequenceMatcher(None, f1, f2)
        result[output_field] = sm.ratio()


except Exception, e:
    splunk.Intersplunk.generateErrorResults("Unhandled exception:  %s" % (e,))


filename = fieldcompare.py
supports_getinfo = true


access = read : [ * ], write : [ admin ]
export = system

access = read : [ * ], write : [ admin ]
export = system

If the example show above, the search command and app are called "fieldcompare", but you can use any name you want.

Here is a usage example:

 ... | fieldcompare field1=first_field field2=compare_field results=output | eval percent=round(100*output,2) | sort - percent

Be sure to look over the Custom search commands docs page for additional details about how you go about setting this up within your splunk environment.


I used this script but its throwing "Error in 'script': Getinfo probe failed for external search command 'fieldcompare'" error. Any suggestions ?

0 Karma

Splunk Employee
Splunk Employee

Yes, this can be done using a custom search script and one of the many Python modules that can compare strings. You can take a look at http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison which discusses using the Levenshtein distance as a measure. With more detail about your use case, I could suggest how to structure a search and custom command, but this should be enough to start with.


I bring to you a message from the future! Nimsh wrote a Levenshtein custom command at some point .. https://splunkbase.splunk.com/app/1898/

Get Updates on the Splunk Community!

Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

 Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...