Rewrite a search using the map command

mikaelbje · ‎04-03-2017

Hi,

I have the following search which I'd like to rewrite if possible without using the map command.
The search is used to track passwords in clear text and what user they belong to in order to notify users know that their password must be changed since it's in the clear.

Let me first explain the search:

The first part looks up failed authentications, a filter search is then applied to remove a few false values
The ut_shannon_lookup looks up the Shannn Entropy of the user field in order to check if it contains a password, not a username. We then keep events with ut_shannon<4 AND ut_shannon>3 (high entropy) and set the user field as a new field called incorrect_password
endtime is set to +1000 seconds
Map runs the same search and takes the sourcetype, dest and src + same starttime and endtime+1000 (in order to find the next user logging into the same system from the same source and with the same sourcetype.
The last where clause excludes events where the user name is the same as the incorrect_password field (possible duplicates)
outlier to remove strange entries such as service accounts logging in thousands of times etc

The search:

| tstats `summariesonly` earliest(_time) AS starttime, latest(_time) AS endtime, latest(sourcetype) AS sourcetype, values(Authentication.src) AS src, values(Authentication.dest) AS dest, count from datamodel=Authentication.Authentication where Authentication.tag="failure" by Authentication.user  | `drop_dm_object_name("Authentication")` |  search user!="*EXAMPLE.COM" user!="HOSTNAME-*" | lookup ut_shannon_lookup word AS user | where ut_shannon<4 AND ut_shannon>3 AND mvcount(src) == 1  | sort count, - ut_shannon | eval incorrect_password=user | eval endtime=endtime+1000 | map maxsearches=70 search="| tstats `summariesonly` earliest(_time) AS starttime, latest(_time) AS endtime, latest(sourcetype) AS sourcetype, values(Authentication.src) AS src, values(Authentication.dest) AS dest,  count from datamodel=Authentication.Authentication where Authentication.tag=success Authentication.src=\"$src$\" Authentication.dest=\"$dest$\" sourcetype=\"$sourcetype$\" earliest=\"$starttime$\" latest=\"$endtime$\" by Authentication.user  | `drop_dm_object_name(\"Authentication\")` |  search user!=\"*EXAMPLE.COM\" user!=\"HOSTNAME-*\" | eval incorrect_password=\"$incorrect_password$\" | eval ut_shannon=\"$ut_shannon$\" | sort count" | where user!=incorrect_password | outlier action=RM count

The search runs fast as it's based on summarized data models, but I'd like to get rid of the map command because of its restrictions. I could rewrite the search using streamstats I guess, but that would make it a lot slower because it isn't based on the accelerated Authentication data model.

How could I rewrite this search using tstats but without the map command?

DalJeanis · ‎04-03-2017

Very interesting. I assume you are getting the next user, in anticipation that it is the same person entering his/her userid into the correct field, after having put his/her password into that field the first time? If so, I wonder slightly at the 1000 seconds end point. One would think that fifteen minutes is a long time between successive password attempts, but you seem to know your data.

Looking at your use-case, I'm skeptical that there's going to be an efficient way to pull it off. Your number of accurate logins (or bad passwords in non-clear-text) has to be roughly two orders of magnitude more data than your clear-password-entry events, so including all of them into a single data pull is going to be accordingly slower at roughly n-log-n.

Which of the limitations on map are you trying to get around?

mikaelbje · ‎04-04-2017

Thanks for your insights! I'll leave this unanswered to attract a few other views on this question.

mikaelbje · ‎04-03-2017

Glad to hear you find the use case interesting. Yes, I anticipate that the next user with the same src, dest and sourcetype is the one who erroneously entered his/her password as a username initially. It's far from perfect but every day I get around 15-30 hits where the username is clearly a password.

The extra 1000 seconds could definitely be decreased, but so far I've had quite good results with this value. Perhaps map gives me the closest match from starttime anyway?

Most of the data is Windows Event Log Security authentications, but this is data type agnostic since it's based on CIM.

Maxresults is the limitation I'd like to get rid of, but perhaps this is just one of these cases where I'll have to live with map. As mentioned the search is quite fast on search results a day back, but if I search 30 days back I have to tune maxresults. Currently searching the last 24 hours is OK for my use case.

DalJeanis · ‎04-19-2017

I believe map is going to give you results in the same order as any other splunk search, latest to first, but you're unlikely to get a second hit with the same src, dest and sourcetype in that kind of time window, unless it's the same guy a third time, so you're getting the right result either way.

The funny thing is, about a week after I read and commented on your use case, I was requested to identify similar cleartext password entries in our system, and mask them. Nontrivial, I must say.

Rewrite a search using the map command

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

Are you a member of the Splunk Community?

Rewrite a search using the map command

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler