Extract out the first line of results from Transac...

silvermail · ‎09-10-2010

Hello guys,

I have a sample log that looks like this:

DATE, TIME, LOGIN, IP_ADDRESS, USERID, EMPLOYEE_ID, WORKSTATION
2010-04-15 22:08:27 LOGON                10.1.2.221  X555555    55555555  pc30200.company.com
2010-04-15 23:07:05 LOGON                10.1.2.243  X111111    11111111  pc34460.company.com
2010-04-16 23:04:51 LOGON                10.1.2.196  X333333    33333333  pc35555.company.com 
2010-04-16 22:49:21 LOGON                10.1.2.243  X222222    22222222  pc34460.company.com 
2010-04-17 23:22:13 LOGON                10.1.1.243  X111111    11111111  pc34460.company.com 
2010-04-17 23:08:03 LOGON                10.1.1.243  X222222    22222222  pc34460.company.com 
2010-04-17 23:35:51 LOGON                10.1.2.160  X662420    33333333  pc31180.company.com 
2010-04-17 23:36:14 LOGON                10.1.2.159  X662420    33333333  pc33333.company.com 
2010-04-17 23:43:52 LOGON                10.1.2.159  X662420    33333333  pc33333.company.com
2010-04-18 23:35:51 LOGON                10.1.2.160  X662420    33333333  pc31180.company.com 
2010-04-18 23:46:14 LOGON                10.1.2.160  X111111    11111111  pc31180.company.com 
2010-04-18 23:58:52 LOGON                10.1.2.160  X662420    33333333  pc31180.company.com 
2010-04-19 08:43:52 LOGON                10.1.2.160  X662420    88888888  pc31180.company.com

I am trying to detect the person who is doing some suspicious behavior on a workstation. They log into a workstation, then log in as someone else within an hour and log back as them self.

I can get the first part of the problem with the following command:

sourcetype="this_log_file" | transaction maxspan=1h workstation_ip |  search linecount >= 3

2010-04-18 23:35:51 LOGON                10.1.2.160  X662420    33333333  pc31180.company.com  
2010-04-18 23:46:14 LOGON                10.1.2.160  X111111    11111111  pc31180.company.com 
2010-04-18 23:58:52 LOGON                10.1.2.160  X662420    33333333  pc31180.company.com

Here is my problem I need to check that the first person logs into the workstation is also the last.

I need to be able to grab the first value and last value and substitute this in to the transaction query with startswith and endswith, see below. (unless there is a better way)

sourcetype="file_withdata" | transaction maxspan=1h workstation_ip | transaction startswith=(employee_id=33333333) endswith=(employee_id=33333333) employee_id | search linecount >= 3

I am having problems substituting the 33333333 with the values returned from first line of the transaction.

Any thoughts? I think this problem is a standard problem when detecting workstation sharing

Thanks in advance

PS Another question, as well to be able to return this transaction if the second person has not performed a logon for the whole day as well.

DalJeanis · ‎04-25-2017

Here's one strategy...

| makeresults 
| eval mydata="1,user1,WS1 5,user2,WS1 7,user1,WS1 3,user3,WS2 5,user4,WS2 6,user4,WS2 3,user5,WS3 5,user6,WS3 6,user6,WS3 8,user5,WS3" 
| makemv mydata | mvexpand mydata 
| rex field=mydata "(?<T>\d+),(?<userid>[^,]+),(?<workstation>.+)"    
| eval _time = relative_time(_time,"@d") +3600*8 + 180*T
| table _time workstation userid 
 | rename COMMENT as "The above generates test data.  Replace it with your search that returns _time, workstation and userid."

| rename COMMENT as "Bin the events by Day and roll together to get the first and last logon for each user on each workstation for that day."
| bin _time as Day span=1d
| stats count as logCount, min(_time) as minTime, max(_time) as maxTime, range(_time) as rangeTime by Day workstation userid
| table Day workstation userid logCount minTime maxTime rangeTime 

| rename COMMENT as "Now we get rid of workstations with only one user and save to a temp file."
| eventstats dc(userid) as usercount by Day workstation
| where (usercount>1) 
| table Day workstation usercount userid logCount minTime maxTime rangeTime 
| appendpipe [| where true() | outputcsv temptimes.csv | where false()]
| rename COMMENT as "The above is a paranoid splunk trick to make sure the subsearch runs and outputs the csv first so that it is present for the subsearch below."

| rename COMMENT as "We connect each record to every user record on that workstation for that day who logged on more than once."
| join Day workstation max=0 
    [| inputcsv temptimes.csv | rename userid as userid2, logCount as logCount2, minTime as minTime2, maxTime as maxTime2, rangeTime as rangeTime2 | where rangeTime2>0]

| rename COMMENT as "We kill records joining a user to themselves, or where the time does not overlap."
| where (userid!=userid2) AND (minTime <= maxTime2) AND (maxTime >= minTime2)

| rename COMMENT as "Now we format the results for presentation."
| fieldformat Day=strftime(Day, "%Y-%m-%d %H:%M:%S") 
| fieldformat minTime=strftime(minTime, "%H:%M:%S") | fieldformat minTime2=strftime(minTime2, "%H:%M:%S")
| fieldformat maxTime=strftime(maxTime, "%H:%M:%S") | fieldformat maxTime2=strftime(maxTime2, "%H:%M:%S")
| fieldformat rangeTime=strftime(rangeTime, "%H:%M:%S") | fieldformat rangeTime2=strftime(rangeTime2, "%H:%M:%S")

DalJeanis · ‎04-25-2017

User5 and user6 demonstrate what happens if the "bad guy" logs on twice under the false id - both ids show up in both places, and the receiver of the report can sort them out.

robertlynch2020 · ‎04-25-2017

hi
I have the same issues, did you get a simple soution?

DalJeanis · ‎04-25-2017

@robertlynch2020 - please post a new question with your exact situation, preferably with some non-confidential sample data, and post a link to this question in your question. There is an extensive and useful answer here by southeringtonp outlining three different approaches, and we can't really narrow it down for you any more than that without knowing what makes your situation unique. It would not be appropriate to add that much detail in this old post, and putting it here would not get as much attention as a well-written NEW post, so a new one is the way to go.

southeringtonp · ‎09-10-2010

This is a surprisingly messy problem.

Unfortunately, I don't think that mvindex will work, because the multivalue field isn't going to contain duplicates -- each value for employee_id would only show up once.

Option 1: Regex Search and Backreference

Try this:

sourcetype=file_withdata | transaction maxspan=1h workstation_ip
| eval distinctusers=mvcount(userid)
| search distinctusers>1
| regex _raw="(?s).*?LOGON\s+\S+\s+(?<a>\S+)\s.*LOGON\s+\S+\s+(?<b>\1)"

This does a few things:

Runs your original transaction search to get the users logging on to each machine

Removes entries that don't have at least two different user IDs logging on

Pares the results down further, to just those whose first and last login lines have the same user ID

Unfortunately, it's not generalized to other log types, but it's a start.

Instead of the backreference, you could also use rex instead of regex to pull the first and last match into separate fields, then use eval to check that they're the same.

Option 2: Write a Custom Search Command

This sort of tracking would lend itself pretty well to a custom search script. Essentially, you'd be rolling your own version of the transaction command, but adding the checks to make sure you have at least one different employee id in the middle.

Overall, this may end up being the best option. It also has the fringe benefit of being more generic (you can base it on field values rather than raw text) and producing much shorter/simpler search strings.

Take a look here for more detail on that option.

Option 3: Simplify Your Requirements

If all you really care about is whether more than one user logged in to a given machine in a short period of time, you may be better off using stats:

sourcetype=file_withdata
| stats dc(employee_id) as dc first(employee_id) as a
  last(employee_id) as b list(employee_id)
| eval is_same=if(a==b,1,0)
| search is_same=1 dc>0

This should work with smaller samples, but won't scale well as you increase the time span.

Option 4: Map command

The map command iterates over multiple search results. In theory you ought to be able to use an initial search and dedup to get a list of unique employee_id. Then use map to iterate over that list and look for your transactions, setting startswith=(employee_id=$employee_id$) and the same for endswith. In practice, I've never been able to get this to work, but not sure why.

silvermail · ‎09-10-2010

Thanks guys for all the tips! Been cracking the head over this, and will try it out and see how it goes.

gkanapathy · ‎09-10-2010

Perhaps you can use the eval function mvindex(employee_id,0) and mvindex(employee_id,mvcount(employee_id)-1) to get the first and last value of employee_id rather than the entire line, which is (after the transaction command) no longer a separate event.

However, it's probably even better to skip using transaction entirely if you can, and just use stats first(employee_id),last(employee_id) by workstation_ip. This doesn't have a maxspan of 1h, but you could perhaps use bucket _time span=1d | stats first(employee_id),last(employee_id) by _time,workstation_ip to get daily breakdowns.