Hello guys,
I have a sample log that looks like this:
DATE, TIME, LOGIN, IP_ADDRESS, USERID, EMPLOYEE_ID, WORKSTATION 2010-04-15 22:08:27 LOGON 10.1.2.221 X555555 55555555 pc30200.company.com 2010-04-15 23:07:05 LOGON 10.1.2.243 X111111 11111111 pc34460.company.com 2010-04-16 23:04:51 LOGON 10.1.2.196 X333333 33333333 pc35555.company.com 2010-04-16 22:49:21 LOGON 10.1.2.243 X222222 22222222 pc34460.company.com 2010-04-17 23:22:13 LOGON 10.1.1.243 X111111 11111111 pc34460.company.com 2010-04-17 23:08:03 LOGON 10.1.1.243 X222222 22222222 pc34460.company.com 2010-04-17 23:35:51 LOGON 10.1.2.160 X662420 33333333 pc31180.company.com 2010-04-17 23:36:14 LOGON 10.1.2.159 X662420 33333333 pc33333.company.com 2010-04-17 23:43:52 LOGON 10.1.2.159 X662420 33333333 pc33333.company.com 2010-04-18 23:35:51 LOGON 10.1.2.160 X662420 33333333 pc31180.company.com 2010-04-18 23:46:14 LOGON 10.1.2.160 X111111 11111111 pc31180.company.com 2010-04-18 23:58:52 LOGON 10.1.2.160 X662420 33333333 pc31180.company.com 2010-04-19 08:43:52 LOGON 10.1.2.160 X662420 88888888 pc31180.company.com
I am trying to detect the person who is doing some suspicious behavior on a workstation. They log into a workstation, then log in as someone else within an hour and log back as them self.
I can get the first part of the problem with the following command:
sourcetype="this_log_file" | transaction maxspan=1h workstation_ip | search linecount >= 3
2010-04-18 23:35:51 LOGON 10.1.2.160 X662420 33333333 pc31180.company.com 2010-04-18 23:46:14 LOGON 10.1.2.160 X111111 11111111 pc31180.company.com 2010-04-18 23:58:52 LOGON 10.1.2.160 X662420 33333333 pc31180.company.com
Here is my problem I need to check that the first person logs into the workstation is also the last.
I need to be able to grab the first value and last value and substitute this in to the transaction query with startswith and endswith, see below. (unless there is a better way)
sourcetype="file_withdata" | transaction maxspan=1h workstation_ip | transaction startswith=(employee_id=33333333) endswith=(employee_id=33333333) employee_id | search linecount >= 3
I am having problems substituting the 33333333 with the values returned from first line of the transaction.
Any thoughts? I think this problem is a standard problem when detecting workstation sharing
Thanks in advance
PS Another question, as well to be able to return this transaction if the second person has not performed a logon for the whole day as well.
Here's one strategy...
| makeresults
| eval mydata="1,user1,WS1 5,user2,WS1 7,user1,WS1 3,user3,WS2 5,user4,WS2 6,user4,WS2 3,user5,WS3 5,user6,WS3 6,user6,WS3 8,user5,WS3"
| makemv mydata | mvexpand mydata
| rex field=mydata "(?<T>\d+),(?<userid>[^,]+),(?<workstation>.+)"
| eval _time = relative_time(_time,"@d") +3600*8 + 180*T
| table _time workstation userid
| rename COMMENT as "The above generates test data. Replace it with your search that returns _time, workstation and userid."
| rename COMMENT as "Bin the events by Day and roll together to get the first and last logon for each user on each workstation for that day."
| bin _time as Day span=1d
| stats count as logCount, min(_time) as minTime, max(_time) as maxTime, range(_time) as rangeTime by Day workstation userid
| table Day workstation userid logCount minTime maxTime rangeTime
| rename COMMENT as "Now we get rid of workstations with only one user and save to a temp file."
| eventstats dc(userid) as usercount by Day workstation
| where (usercount>1)
| table Day workstation usercount userid logCount minTime maxTime rangeTime
| appendpipe [| where true() | outputcsv temptimes.csv | where false()]
| rename COMMENT as "The above is a paranoid splunk trick to make sure the subsearch runs and outputs the csv first so that it is present for the subsearch below."
| rename COMMENT as "We connect each record to every user record on that workstation for that day who logged on more than once."
| join Day workstation max=0
[| inputcsv temptimes.csv | rename userid as userid2, logCount as logCount2, minTime as minTime2, maxTime as maxTime2, rangeTime as rangeTime2 | where rangeTime2>0]
| rename COMMENT as "We kill records joining a user to themselves, or where the time does not overlap."
| where (userid!=userid2) AND (minTime <= maxTime2) AND (maxTime >= minTime2)
| rename COMMENT as "Now we format the results for presentation."
| fieldformat Day=strftime(Day, "%Y-%m-%d %H:%M:%S")
| fieldformat minTime=strftime(minTime, "%H:%M:%S") | fieldformat minTime2=strftime(minTime2, "%H:%M:%S")
| fieldformat maxTime=strftime(maxTime, "%H:%M:%S") | fieldformat maxTime2=strftime(maxTime2, "%H:%M:%S")
| fieldformat rangeTime=strftime(rangeTime, "%H:%M:%S") | fieldformat rangeTime2=strftime(rangeTime2, "%H:%M:%S")
User5 and user6 demonstrate what happens if the "bad guy" logs on twice under the false id - both ids show up in both places, and the receiver of the report can sort them out.
hi
I have the same issues, did you get a simple soution?
@robertlynch2020 - please post a new question with your exact situation, preferably with some non-confidential sample data, and post a link to this question in your question. There is an extensive and useful answer here by southeringtonp outlining three different approaches, and we can't really narrow it down for you any more than that without knowing what makes your situation unique. It would not be appropriate to add that much detail in this old post, and putting it here would not get as much attention as a well-written NEW post, so a new one is the way to go.
This is a surprisingly messy problem.
Unfortunately, I don't think that mvindex
will work, because the multivalue field isn't going to contain duplicates -- each value for employee_id
would only show up once.
Try this:
sourcetype=file_withdata | transaction maxspan=1h workstation_ip
| eval distinctusers=mvcount(userid)
| search distinctusers>1
| regex _raw="(?s).*?LOGON\s+\S+\s+(?<a>\S+)\s.*LOGON\s+\S+\s+(?<b>\1)"
This does a few things:
Unfortunately, it's not generalized to other log types, but it's a start.
Instead of the backreference, you could also use rex
instead of regex
to pull the first and last match into separate fields, then use eval
to check that they're the same.
This sort of tracking would lend itself pretty well to a custom search script. Essentially, you'd be rolling your own version of the transaction
command, but adding the checks to make sure you have at least one different employee id in the middle.
Overall, this may end up being the best option. It also has the fringe benefit of being more generic (you can base it on field values rather than raw text) and producing much shorter/simpler search strings.
Take a look here for more detail on that option.
If all you really care about is whether more than one user logged in to a given machine in a short period of time, you may be better off using stats
:
sourcetype=file_withdata
| stats dc(employee_id) as dc first(employee_id) as a
last(employee_id) as b list(employee_id)
| eval is_same=if(a==b,1,0)
| search is_same=1 dc>0
This should work with smaller samples, but won't scale well as you increase the time span.
The map
command iterates over multiple search results. In theory you ought to be able to use an initial search and dedup
to get a list of unique employee_id
. Then use map
to iterate over that list and look for your transactions, setting startswith=(employee_id=$employee_id$)
and the same for endswith
. In practice, I've never been able to get this to work, but not sure why.
Thanks guys for all the tips! Been cracking the head over this, and will try it out and see how it goes.
Perhaps you can use the eval
function mvindex(employee_id,0)
and mvindex(employee_id,mvcount(employee_id)-1)
to get the first and last value of employee_id
rather than the entire line, which is (after the transaction
command) no longer a separate event.
However, it's probably even better to skip using transaction
entirely if you can, and just use stats first(employee_id),last(employee_id) by workstation_ip
. This doesn't have a maxspan of 1h, but you could perhaps use bucket _time span=1d | stats first(employee_id),last(employee_id) by _time,workstation_ip
to get daily breakdowns.