I have the following search which I'd like to rewrite if possible without using the map command.
The search is used to track passwords in clear text and what user they belong to in order to notify users know that their password must be changed since it's in the clear.
Let me first explain the search:
| tstats `summariesonly` earliest(_time) AS starttime, latest(_time) AS endtime, latest(sourcetype) AS sourcetype, values(Authentication.src) AS src, values(Authentication.dest) AS dest, count from datamodel=Authentication.Authentication where Authentication.tag="failure" by Authentication.user | `drop_dm_object_name("Authentication")` | search user!="*EXAMPLE.COM" user!="HOSTNAME-*" | lookup ut_shannon_lookup word AS user | where ut_shannon<4 AND ut_shannon>3 AND mvcount(src) == 1 | sort count, - ut_shannon | eval incorrect_password=user | eval endtime=endtime+1000 | map maxsearches=70 search="| tstats `summariesonly` earliest(_time) AS starttime, latest(_time) AS endtime, latest(sourcetype) AS sourcetype, values(Authentication.src) AS src, values(Authentication.dest) AS dest, count from datamodel=Authentication.Authentication where Authentication.tag=success Authentication.src=\"$src$\" Authentication.dest=\"$dest$\" sourcetype=\"$sourcetype$\" earliest=\"$starttime$\" latest=\"$endtime$\" by Authentication.user | `drop_dm_object_name(\"Authentication\")` | search user!=\"*EXAMPLE.COM\" user!=\"HOSTNAME-*\" | eval incorrect_password=\"$incorrect_password$\" | eval ut_shannon=\"$ut_shannon$\" | sort count" | where user!=incorrect_password | outlier action=RM count
The search runs fast as it's based on summarized data models, but I'd like to get rid of the map command because of its restrictions. I could rewrite the search using streamstats I guess, but that would make it a lot slower because it isn't based on the accelerated Authentication data model.
How could I rewrite this search using tstats but without the map command?
Very interesting. I assume you are getting the next user, in anticipation that it is the same person entering his/her userid into the correct field, after having put his/her password into that field the first time? If so, I wonder slightly at the 1000 seconds end point. One would think that fifteen minutes is a long time between successive password attempts, but you seem to know your data.
Looking at your use-case, I'm skeptical that there's going to be an efficient way to pull it off. Your number of accurate logins (or bad passwords in non-clear-text) has to be roughly two orders of magnitude more data than your clear-password-entry events, so including all of them into a single data pull is going to be accordingly slower at roughly n-log-n.
Which of the limitations on map are you trying to get around?
Glad to hear you find the use case interesting. Yes, I anticipate that the next user with the same src, dest and sourcetype is the one who erroneously entered his/her password as a username initially. It's far from perfect but every day I get around 15-30 hits where the username is clearly a password.
The extra 1000 seconds could definitely be decreased, but so far I've had quite good results with this value. Perhaps map gives me the closest match from starttime anyway?
Most of the data is Windows Event Log Security authentications, but this is data type agnostic since it's based on CIM.
Maxresults is the limitation I'd like to get rid of, but perhaps this is just one of these cases where I'll have to live with map. As mentioned the search is quite fast on search results a day back, but if I search 30 days back I have to tune maxresults. Currently searching the last 24 hours is OK for my use case.
map is going to give you results in the same order as any other splunk
search, latest to first, but you're unlikely to get a second hit with the same
sourcetype in that kind of time window, unless it's the same guy a third time, so you're getting the right result either way.
The funny thing is, about a week after I read and commented on your use case, I was requested to identify similar cleartext password entries in our system, and mask them. Nontrivial, I must say.