I have a SPL which detects the lookalike short and long domains. My goal is to implement a CSV lookup which allows to add exceptions to these results. For example if “compeni” is a false & its close to “company” it will keep flooding these results. So, I would like to have a CSV lookup which can be edited overtime to reduce the false cases
column in csv can be something like reel_domain which consists of the list of false domains.
Now my question is how can I implement my idea in the below search?
index=* src_user!="" src_user!="*company" AND src_user!="*comp.com" AND src_user!="*compan.com" AND src_user!="*compe.com" AND src_user!="*compani.com"
| dedup src_user
| rex field=src_user "(?:@)(?<detected_domain>[^>]*)"
| eval domain_list=split(detected_domain, ".")
| eval domain_list=mvfilter(len(domain_list)>3)
| eval domain_list=mvfilter(domain_list!="filter_example")
| eval domain_list=if(mvcount(domain_list)>3, mvindex(domain_list, -3), domain_list)
| rename domain_list as word2
| makemv word1
| eval word1 = mvappend(word1, "company")
| lookup local=t ut_levenshtein_lookup word1 word2
| eval ut_levenshtein=mvfilter(ut_levenshtein!=0)
| eval ut_levenshtein=min(ut_levenshtein)
| rename ut_levenshtein as ct_long
| rename word1 as lg_domain
| eval word1=mvappend(word1, "comp"), word1=mvappend(word1, "compan"), word1=mvappend(word1, "compe"), word1=mvappend(word1, "compani")
| lookup local=t ut_levenshtein_lookup word1 word2
| eval ut_levenshtein=mvfilter(ut_levenshtein!=0)
| eval ut_levenshtein=min(ut_levenshtein)
| rename ut_levenshtein as ct_short
| rename word1 as st_domain
| rename word2 as input_dom
| search ct_short<=1 OR ct_long<=3
| table src_user input_dom st_domain ct_short lg_domain ct_long
Any help would be appreciated..
Thanks in advance
Hi @Woodpecker,
you could create a lookup (calleg e.g. false_positives.csv, with one column called e.g. "pattern") in which you could store all the false positives you have and the add to your search a row like this
| search NOT [ | inputlookup false_positives.csv | rename pattern AS query | fields query ]
In this way you perform a full text search on your events excluding the events where one word match one of the patterns of your lookup.
You have to put this command in the main search.
Ciao.
Giuseppe