Hi @BradOH, Are you using JellyFisher from Splunkbase? We can use JellyFisher to build a prototype natural language data model based on various codecs, e.g., metaphone. Newer versions of the Jellyfish module or codecs with expanded language coverage, e.g., Beider-Morse as referenced by @bowesmana, can replace the external lookup described below. We could also blend in matching by similarity, e.g., Levenshtein distance, and stem. Start by downloading and installing JellyFisher. Verify JellyFisher with a simple search. I'm using variations on my given name: | makeresults format=csv data="
src_user,src_user_domain
travis@example.com,example.com
trefor,
trev,
trever@example.com,example.com
trevon@example.com,example.com
tr3v0r,
trev0r,
trevor,
"
| rex field=src_user "(?<src_user_local_part>[^@]+)"
| jellyfisher metaphone(src_user_local_part)
src_user src_user_domain src_user_local_part metaphone
travis@example.com example.com travis TRFS
trefor trefor TRFR
trev trev TRF
trever@example.com example.com trever TRFR
trevon@example.com example.com trevon TRFN
tr3v0r tr3v0r TRFR
trev0r trev0r TRFR
trevor trevor TRFR trefor, trever, tr3v0r, trev0r, and trevor are encoded as TRFR. I'll use this example throughout. Let's create an external lookup script to allow Splunk to automatically add JellyFisher output to events at search time. In $SPLUNK_HOME/etc/apps/jellyfisher/bin, create jellyfisher_lookup.py: #!/usr/bin/env python
import csv
import os
import sys
splunkhome = os.environ["SPLUNK_HOME"]
sys.path.append(os.path.join(splunkhome, "etc", "apps", "jellyfisher", "lib"))
import jellyfish
def main():
if len(sys.argv) != 2:
print("Usage: python jellyfisher_lookup_metaphone.py field", file=sys.stderr)
sys.exit(1)
field = sys.argv[1]
infile = sys.stdin
outfile = sys.stdout
r = csv.DictReader(infile)
w = csv.DictWriter(outfile, fieldnames=r.fieldnames + ['metaphone'])
w.writeheader()
for result in r:
if result[field]:
value = result[field]
result['metaphone'] = jellyfish.metaphone(value).strip()
w.writerow(result)
main() We can test the script locally using the Splunk CLI: $ echo -e 'foo\ntrevor' | /opt/splunk/bin/splunk cmd python jellyfisher_lookup_metaphone.py foo
foo,metaphone
trevor,TRFR Edit $SPLUNK_HOME/etc/apps/jellyfisher/local/transforms.conf: [jellyfisher_metaphone]
external_cmd = jellyfisher_lookup_metaphone.py field
fields_list = field, metaphone
python.version = python3 Restart Splunk. Verify the jellyfisher_metaphone lookup: | makeresults format=csv data="
src_user,src_user_domain
travis@example.com,example.com
trefor,
trev,
trever@example.com,example.com
trevon@example.com,example.com
tr3v0r,
trev0r,
trevor,
"
| rex field=src_user "(?<src_user_local_part>[^@]+)"
| lookup jellyfisher_metaphone field as src_user_local_part output metaphone as src_user_phonetic
src_user src_user_domain src_user_local_part src_user_phonetic
travis@example.com example.com travis TRFS
trefor trefor TRFR
trev trev TRF
trever@example.com example.com trever TRFR
trevon@example.com example.com trevon TRFN
tr3v0r tr3v0r TRFR
trev0r trev0r TRFR
trevor trevor TRFR To glue your mail server's add-on to the lookup, create a field extraction, field alias, or calculated field to extract a new field named src_user_local_part from your source events. For example, use SplunkWeb to add a calculated field to Splunk Add-on for Microsoft Office 365 (splunk_ta_o365): Destination app: splunk_ta_o365 Apply to: sourcetype named o365:reporting:messagetrace Name: src_user_local_part Eval expression: mvindex(split(src_user, "@"), 0) Edit permissions and export the object globally: Object should appear in: All apps (system) Set permissions by roles per your security schema. The default is typically read for everyone and write for admin and power. Alternatively, edit $SPLUNK_HOME/etc/apps/splunk_ta_o365/local/props.conf: [o365:reporting:messagetrace]
EVAL-src_user_local_part = mvindex(split(src_user, "@"), 0) and $SPLUNK_HOME/etc/apps/splunk_ta_o365/metadata/local.meta: [props/o365%3Areporting%3Amessagetrace/EVAL-src_user_local_part]
access = read : [ * ], write : [ admin, power ]
export = system Create a new automatic lookup: Destination app: splunk_ta_o365 Name: jellyfisher_metaphone Lookup table: jellyfisher_metaphone Apply to: sourcetype named o365:reporting:messagetrace Lookup input fields: field = src_user_local_part Lookup output fields: metaphone = src_user_phonetic Edit permissions and export the object globally. Alternatively, edit $SPLUNK_HOME/etc/apps/splunk_ta_o365/local/props.conf: [o365:reporting:messagetrace]
LOOKUP-jellyfisher_metaphone = jellyfisher_metaphone field AS src_user_local_part OUTPUTNEW metaphone AS src_user_phonetic and $SPLUNK_HOME/etc/apps/splunk_ta_o365/metadata/local.meta: [props/o365%3Areporting%3Amessagetrace/LOOKUP-jellyfisher_metaphone]
access = read : [ * ], write : [ admin, power ]
export = system Create an event type and tag to identify events with src_user_local_part: Destination App: splunk_ta_o365 Name: phonetic Search string: src_user_local_part_phonetic=* Tag(s): email,phonetic Color: none (or a color of your choice) Priority: 1 (Highest) Edit permissions and export the objects globally. Alternatively, edit $SPLUNK_HOME/etc/apps/splunk_ta_o365/local/eventypes.conf: [o365_reporting_messagetrace_phonetic]
search = sourcetype=o365:reporting:messagetrace src_user_phonetic=* and $SPLUNK_HOME/etc/apps/splunk_ta_o365/local/tags.conf: [eventtype=o365_reporting_messagetrace_phonetic]
email = enabled
phonetic = enabled and $SPLUNK_HOME/etc/apps/splunk_ta_o365/metadata/local.meta: [tags/eventtype%3Do365_reporting_messagetrace_phonetic]
access = read : [ * ], write : [ admin, power ]
export = system To make the src_user_phonetic field searchable, edit $SPLUNK_HOME/etc/apps/splunk_ta_o365/local/fields.conf: [src_user_phonetic]
INDEXED = false
INDEXED_VALUE = false Restart Splunk. Verify the configuration: index=main sourcetype=o365:reporting:messagetrace src_user_phonetic=TRFR
| table src_user src_user_local_part src_user_phonetic tag
src_user src_user_local_part src_user_phonetic tag
trefer@example.com trefer TRFR email phonetic
trevor@example.com trevor TRFR email phonetic
trever@example.com trever TRFR email phonetic To create a summary index of the src_user_phonetic field, we can create or edit a data model. Let's clone and edit the Splunk CIM Email data model for this example. In production, I recommend a custom data model. Data Model: Email New Title: Email Phonetic New ID: Email_Phonetic App: Search & Reporting (do not clone to Splunk Common Information Model) New Description: Email Phonetic Data Model Permissions: Clone Note that cloning the data model also clones the tags_whitelist setting, which does not include our new phonetic tag. To resolve this issue, edit $SPLUNK_HOME/etc/apps/search/local/datamodels.conf: [Email_Phonetic]
tags_whitelist = cloud,content,delivery,filter,pci,phonetic SplunkWeb does not allow us to add extracted fields to child datasets. For simplicity here, we'll add the src_user_phonetic field to the root dataset. With the All Email dataset selected, click Add Field > Auto-Extracted. Select the src_user_phonetic field with type String and flag Optional, and then click Save. If the field is not visible, increase the sample size or verify field extractions, lookups, etc. are implemented correctly. With the All Email dataset selected, click Add Dataset > Child. Dataset Name: Email Phonetic Dataset ID: Phonetic Inherit From: All Email Additional Constraints: tag=phonetic Verify the data model: | datamodel Email_Phonetic Phonetic flat
| search src_user_phonetic=TRFR
| table src_user src_user_phonetic
src_user src_user_phonetic
trefer@example.com TRFR
trevor@example.com TRFR
trever@example.com TRFR Ensure the cim_Email_indexes macro references the correct indexes, accelerate the data model, wait for the summaries to build, and then verify the data model again: | datamodel Email_Phonetic Phonetic flat summariesonly=true
| search src_user_phonetic=TRFR
| table src_user src_user_phonetic
src_user src_user_phonetic
trefer@example.com TRFR
trevor@example.com TRFR
trever@example.com TRFR Finally, tie it together with a subsearch to dynamically generate the phonetic encoding: | datamodel Email_Phonetic Phonetic flat summariesonly=true
| search
[| makeresults
| eval user="trevor"
| lookup jellyfisher_metaphone field as user output metaphone as src_user_phonetic
| table src_user_phonetic ]
| table src_user src_user_phonetic
src_user src_user_phonetic
trefer@example.com TRFR
trevor@example.com TRFR
trever@example.com TRFR We can turn this into a macro for quick access: Destination app: search Name: email_by_src_user_phonetic(1) Definition: datamodel Email_Phonetic Phonetic flat summariesonly=true | search [| makeresults | eval user="$src_user$" | lookup jellyfisher_metaphone field as user output metaphone as src_user_phonetic | table src_user_phonetic ] | table src_user src_user_phonetic Arguments: src_user Edit permissions and export the macro globally. Verify the macro: | `email_by_src_user_phonetic(trevor)`
src_user src_user_phonetic
trefer@example.com TRFR
trevor@example.com TRFR
trever@example.com TRFR If the datamodel command is slow in your environment, use the tstats command: | tstats summariesonly=true count from datamodel=Email_Phonetic.All_Email where nodename=All_Email.Phonetic
[| makeresults
| eval user="trevor"
| lookup jellyfisher_metaphone field as user output metaphone as All_Email.src_user_phonetic
| table All_Email.src_user_phonetic ] by _time span=1s All_Email.src_user All_Email.src_user_phonetic
_time All_Email.src_user All_Email.src_user_phonetic count
2026-01-19 12:25:30 trever@example.com TRFR 1
2026-01-19 12:25:36 trefer@example.com TRFR 1
2026-01-19 12:25:36 trevor@example.com TRFR 1
... View more