I would like to compare two field values and return a new field with a percent match between the two.
index=dlp severity="1:High" sender!="N/A" | table _time, sender, recipients, Filename, Count, severity, incident_id, policy, | sort -_time
For example, if part of my search returns
sender: John.Smith@Coolcompany.com Recipients: JohnSmith546@mail.com
I would like a new field named PercentMatch to return
PercentMatch: 80% ( or whatever the actual calculation may be)
The goal is to help determine when users are sending themselves emails to their personal account. Thank you
Have you thought about a workaround using the cluster command?
"The cluster command groups events together based on how similar they are to each other"
Assuming the _time as unique identifier per mail I could think of something like:
| makeresults | eval sender="John.Smith@Coolcompany.com" | eval recipientes="JohnSmith546@mail.com" | eval combined = sender + "," + recipientes | makemv delim="," combined | stats values(combined) as combined BY _time | stats count BY combined, _time | cluster labelonly=true t=0.1 match=ngramset field=combined | stats, values(combined), dc(cluster_label) BY _time
This compares both adresses and gives them the same cluster_label, if they are similar. A final dc(clusterlabel)=1 means, that it might be the same person
I don't think something like this (comparing two strings for similarities) is natively available. You might have to create some custom search command to achieve the same. Have a look at following post