I have two fields se_split
and re_split
which are lined up like so
re_split se_split
a g
g h
h p
e q
c a
t z
o t
p w
Is there a way for me to compare the two fields character by character and add up how many characters match?
much thanks.
See my answers here for background:
https://answers.splunk.com/answers/567851/how-can-i-compare-mvfields-and-get-a-diff.html
https://answers.splunk.com/answers/734599/how-to-compare-the-same-search-from-the-previous-d.html
If your stuff is already in a mv-field then you can skip this part, but if not, do this:
index=YouShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo
| stats values(re_split) AS re_split values(se_split) AS se_split BY whatever
For run anywhere, try this:
| makeresults
| eval re_split="a g h e c t o p", se_split="g h p q a z t w"
| makemv re_split
| makemv se_split
Then you can EITHER do this:
| streamstats count AS _serial
| multireport
[| mvexpand se_split
| where re_split!=se_split
| rename se_split AS se_only]
[| mvexpand re_split
| where re_split!=se_split
| rename re_split AS re_only]
| stats values(*) AS * BY _serial
OR this:
| nomv re_split
| nomv se_split
| rex field=re_split mode=sed "s/[\r\n\s]+/;/g"
| rex field=se_split mode=sed "s/[\r\n\s]+/;/g"
| eval setdiff = split(replace(replace(replace(replace(mvjoin(mvsort(mvappend(split(replace(re_split, "(;|$)", "#1;"), ";"), split(replace(se_split, "(;|$)", "#0;"), ";"))), ";"), ";(\w+)#0\;\1#1", ""), ";\w+#1", ""), "#0", ""), ";(?!\w)|^;", ""), ";")
So you need a set diff
that runs against 2 multi-valued
fields, essentially, mvdiff
.
See my answers here for background:
https://answers.splunk.com/answers/567851/how-can-i-compare-mvfields-and-get-a-diff.html
https://answers.splunk.com/answers/734599/how-to-compare-the-same-search-from-the-previous-d.html
If your stuff is already in a mv-field then you can skip this part, but if not, do this:
index=YouShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo
| stats values(re_split) AS re_split values(se_split) AS se_split BY whatever
For run anywhere, try this:
| makeresults
| eval re_split="a g h e c t o p", se_split="g h p q a z t w"
| makemv re_split
| makemv se_split
Then you can EITHER do this:
| streamstats count AS _serial
| multireport
[| mvexpand se_split
| where re_split!=se_split
| rename se_split AS se_only]
[| mvexpand re_split
| where re_split!=se_split
| rename re_split AS re_only]
| stats values(*) AS * BY _serial
OR this:
| nomv re_split
| nomv se_split
| rex field=re_split mode=sed "s/[\r\n\s]+/;/g"
| rex field=se_split mode=sed "s/[\r\n\s]+/;/g"
| eval setdiff = split(replace(replace(replace(replace(mvjoin(mvsort(mvappend(split(replace(re_split, "(;|$)", "#1;"), ";"), split(replace(se_split, "(;|$)", "#0;"), ";"))), ";"), ";(\w+)#0\;\1#1", ""), ";\w+#1", ""), "#0", ""), ";(?!\w)|^;", ""), ";")
Thank you. This works great for what im trying to do.
I am curious: which one did you use?
All those characters appear in different row OR it's just one multivalued field in single row?
All character are in a single field. I've taken usernames and strung them out and am now trying to compare the two so that i can see if they match a certain amount.
What would be the expected output based on your example? Do they have to match in the order they appear in the field OR just character match? (e.g. in your sample if you want to compare by order,, nothing matches, but if you just want to check existence of the character, you get aghtp
I just want to match if re_split is in se_split. if it returns the letters that are in that field that is fine because I can just have it count how many letters there are in comparison to se_split and come up with a final number that way. in the end i just want a number that tells me how many matching characters there are and then im going to subtract the number of matching characters from the number of characters in se_split and return a percentage value as my final number. so yes i just want to check existence.
It'll be easier to give solution if you can provide your current query. You basically have to create a new field which is copy of re_split, expand it (using mvexpand), then compare the character if it's present in se_split (using mvfind) then run some stats to count and combine rows back to original count.
Here is my query. I commented in the place that I'm trying to run that mvfind
command. I have two emails, i cut anything after @
giving me usernames
of sorts then i split them and now am trying to search by sender
to see if the sender
is trying to send something to an email that closely resembles his own.
index=msexchange size>2000000 directionality="Originating" AND action="delivered" AND recipient!="*iccu.com" AND NOT
(message_subject="RE*" OR message_subject="FW*" OR message_subject="*EXTERNAL*") AND
(recipient="*gmail.com" OR recipient="*.edu" OR recipient="*hotmail.com" OR recipient "*yahoo.com" OR recipient="*msn.com" OR
recipient="*outlook.com" OR recipient="*aol.com" OR recipient="*zoho.com" OR recipient="*icloud.com" OR recipient="*inbox.com" OR recipient="*mail.com" OR recipient="*yandex.com" OR recipient="*protonmail.com")
|rex field=recipient "^(?<re_name>.*)@(?<trash1>.*)$" `comment("START OF FIRST 3 COMPARASON")`
|rex field=sender "^(?<se_name>.*)@(?<trash1>.*)$"
|rex field=se_name "^(?P<se_first3>...)"
|rex field=re_name "^(?P<re_first3>...)"
|eval first_value=if(like(se_first3, re_first3), 1, 0) `comment("START OF LAST 3 COMPARASON")`
|rex field=re_name "(?<re_last3>\w{3})$"
|rex field=se_name "(?<se_last3>\w{3})$"
|eval last_value=if(like(se_last3, re_last3), 1, 0)
|eval subject_value=if(isnull(message_subject), 1, 0)
`comment("THIS IS WHERE IM PULLING OUT CHARACTERS THAT I DONT WANT")`
|replace "*.*" WITH "" in re_name
|replace "*_*" WITH "" in re_name
|eval se_cut=replace(se_name,"^.","")
|eval re_cut=replace(re_name,"^.","")
|eval se_mvmake=se_name
|eval re_mvmake=re_name
| rex field=re_mvmake "(?<re_my1>.*?)\d\d\d+(?<re_my2>.*)" | eval re_mvmake=if(isnull(re_my1), re_mvmake, re_my1+re_my2)
| rex field=se_mvmake "(?<se_my1>.*?)\d\d\d+(?<se_my2>.*)" | eval se_mvmake=if(isnull(se_my1), se_mvmake, se_my1+se_my2)
`comment("IGNORE THIS")`
|makemv re_mvmake delim=‘“\”’
|makemv se_mvmake delim=‘“\”’
|mvexpand re_mvmake
|mvexpand se_mvmake
|rex field=re_name max_match=10 "\"(?<re_mv>.*?)\""
|rex field=se_name max_match=10 "\"(?<se_mv>.*?)\""
`comment("THIS IS WHAT I AM WORKING ON NOW BASED ON YOUR COMMENT")`
|eval re_split=split(re_name,"")
|eval se_split=split(se_name,"")
|mvexpand se_split
|mvexpand re_split
|eval split_total=mvfind(se_split, re_split)
|eval threat_num=4
|eval is_threat=if(like(se_name, re_name), 2, 0)
|eval appraise=if(like(message_subject, "%ppraisal%"), 2, 0) `comment("START OF MESSAGE SUBJECT COMPARASON")`
|eval payment=if(like(message_subject, "%ayment%"), 2, 0)
|eval loan=if(like(message_subject, "%oan%"), 2, 0)
|eval estimate=if(like(message_subject, "%stimate%"), 2, 0)
|eval size_count=if(size>5000000, 1, 0)
|eval diff=(threat_num - appraise - payment - loan - estimate + size_count + first_value + last_value + is_threat + subject_value)
|eval Threat_Level=case(diff=1 OR diff=2, "Low", diff=3 OR diff=4, "Medium", diff=5, "High", diff=6 OR diff=7 OR diff=8, "Urgent")
| eval Time=strftime(_time,"%Y-%m-%d %H:%M:%S")
`comment("|where Threat_Level="High" OR Threat_Level="Urgent"")`
|stats values(split_total) values(se_split) values(re_split) values(Threat_Level) as "Threat Level" values(size) as "Message Size" values(recipient) as Recipient values(Time) as Time by sender subject | sort -"Message Size" "Threat Level"