So having an issue with extreme search. I have a DD context generated for users sending emails based off their identity_id which populates fine. checked it via the xsdisplaycontext on the ID and get results for instance.
|xsdisplaycontext FROM email_count_per_1h_by_user by edf5a5e02d47234647599dd1e76c61ee23a09127
this displays minimal of 0 to a extreme of 232. so i built a correlation search around this searching. this user in a given hour had 2972 and checking with
| xsWhere identity_id FROM email_count_per_1h_by_user IN email_count_per_1h_by_user by identity_id is above high
though the issue is it doesn't show as above high. not sure why as the count is way above the extreme in the context. others are showing properly though this is not.
if i check which term it is falling under its falling under the minimal which is the lowest. that term goes from a count 0-57.31765
i am not sure what i am missing on this. is there a way for me to debug this search and find out why xswhere is putting it under minimal?
The swapping xswhere with xsGetWhereCIX. see what CIX aka correlation value you get on your statement. The command xsFindBestConcept can also be useful. Remember also that if a value is not in a context that extreme search then uses the weighted average of all values to decide how to rate the unknown.
ya the xsFindBestConcept puts it squarely in minimal and it has a 1.0000 CIX in that.
just very odd. makes me wonder how many DDoS, Bruteforce, and other things are so outside the numbers that they are being missed.
right now i am just using a lookup with the stats for each user and running the numbers that way instead of trusting on XS to figure it out.
after doing some more testing to see if i could figure it out. i created the stats in my own lookup table and added perc99 to it so that i could test high values. but to test it out to make sure it was working i ran the following part of a search
| stats sum(count) as sumtotal by identity_id |search identity_id=edf5a5e02d47234647599dd1e76c61ee23a09127 |lookup 90_day_email_stats.csv identity_id OUTPUT 99_perc | rename "99_perc" as extremenumber |search sumtotal > extremenumber
which failed even though i knew that sumtotal was greater than the extremenumber. changed the logic operator to less than which should not have worked.
| stats sum(count) as sumtotal by identity_id |search identity_id=edf5a5e02d47234647599dd1e76c61ee23a09127 |lookup 90_day_email_stats.csv identity_id OUTPUT 99_perc | rename "99_perc" as extremenumber |search sumtotal < extremenumber
this got me the output
identity_id, sumtotal, extremenumber
edf5a5e02d47234647599dd1e76c61ee23a09127, 2972, 32
so the only thing i could think of is maybe the 2972 is not actually a number but a string. not sure why since its only calculated via sum in stats. but what they heck lets check. did a |convert num(sumtotal) then tested again and got the same results. figured i check via an eval
| eval test=if(isnum(sumtotal),"is number","not number") and test came out to "is number". so obviously its a number and splunk knows that. so lets see if we can do math on it other than logic. so i tried
| eval test=if((sumtotal-extremenumber)>=0,1,0)
this got me the greater than and less than logic i needed. The solution i had to come up with to get it to do logic was:
| stats sum(count) as sumtotal by identity_id |lookup 90_day_email_stats.csv identity_id OUTPUT 99_perc | rename "99_perc" as extremenumber | eval test=if((sumtotal-extremenumber)>=0,1,0) | search test=1 | where sumtotal>=[| xsdisplaycontext FROM email_count_per_1h_by_user |sort extreme desc | head 1| rename "email_count_per_1h_by_user/" as search | table search]
Though this doesn't really explain why the logic operator did not work and makes me wonder what else it is failing on which when your in cyber security and integrity is one of the core principles it makes it hard to trust splunk is doing it right.
btw this instance is in a splunk cloud environment.