Splunk Search

How can I search abnormal user behaviour?

Jackiifilwhh
Path Finder

Background information

In our system, every visit consists of one or more actions. Every action has its own name and in Splunk it's a field named "transId". Every time an action is triggered, it has a unique sequence and in Splunk it's a field named "gsn". A customer has his unique id and in Splunk it's a field named "uid". During the period of a customer visit our system, he has a unique session id and in Splunk it's a field named "sessionId". If we want to locate a complete operation of a user, we need to use uid and sessionId together. Like many other systems, the order of actions in our system is fixed, under normal circumstances.

What we want

We want to create an alter to monitor the abnormal order of actions. For example, an important action named "D", it is at the end of an action-chain. Under normal circumstances, you must  access our system by the order of actions "A B C D". But some hackers may skip the trans B, which may be an action that verify his identity. The problem is that I don't know the command to get abnormal results. We can accept that we need to input the order of actions for every action-chain. It's better to read the order by configuration file.

What I've tried

 

| stats count by sessionId uid transId gsn _time
| sort 0 sessionId uid _time

 

I can get every use's order of actions by this command.

Can you give me some advice? If you want to get more information, you can ask me here.

Best wishes!

Labels (5)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

The challenge with values() is that if there are any values that are the same, they will be removed and also values() will sort the results, so it will not preserve order in the multi-value field, so you will not be able to determine the MV position as the order of operation. You will then need to consider time.

Without knowing your data better, it's hard to propose a solution. 

If you are really only interested in a subset of the actions performed under the same session Id, you could collect those actions only with stats list(), and avoid the 100 limit.

For example

| eval important_action=if(action=1 OR action=2 OR action=3, 1, 0)
| stats list(eval(if(important_action=1, transId, null()))) as transIds
        list(eval(if(important_action=1, gsn, null()))) as gsns
        list(eval(if(important_action=1, _time, null()))) as times ...

which would collection only those action=1, 2, or 3 attributes in the list

 

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There are many approaches to such case.

Firstly, there is a whole product dedicated to Users' Behaviour Analytics. But that costs noticeable amount of money and I suppose you're not interested in that.

You can use the list and multivalued fields approach as @bowesmana already showed.

You can also simply sort your data, then - for example - use streamstats to count consecutive operations of the same type and only after that filter it and combine into a single list (this way you could remove - for example - multiple steps of the same time; like adding many different items to a basket befor checkout).

 

0 Karma

Jackiifilwhh
Path Finder

Thank you so much! Could you tell me the name of the product you mentioned above and maybe I can take a brief look at it. And I will try streamstats , think like you said.

0 Karma

PickleRick
SplunkTrust
SplunkTrust
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You asked this question already, but I reply to this one.

This base stats command  can give you a list of transIds, times and gsn for each session and uid

| stats list(transId) as transIds list(_time) as times list(gsn) as gsn by sessionId uid

 Note that using list will give all results from the ones it finds up to a maximum of 100 results - if you can exceed that, list will not work.

However, at that point you will have all actions by a user within a session.

Then you can start to determine 'abnormal'. If your sequence of actions is fixed and well known, then it is fairly easy to look at each of the values in the transIds field, so for example if they skip B (second action), then you would have A,C,D and this test would highlight that

| where mvcount(transIds)>1 AND mvindex(transIds, 1)!="ActionB"

or if your last action should be D and there should be 4 actions, then

| where mvcount(transIds)=4 AND mvindex(transIds, 3)!="ActionD"

These logic tests are things you will need to define based on your data 

 

 

0 Karma

Jackiifilwhh
Path Finder

Thanks a lot! I will try these command and think like this. But the length of sequence of actions is not fixed. We want to find the result when action D appears, it must have A B C before D, but the whole action-chain can be Z X C A B C D G J L I. If there a command can transfer all transId field into a string or other value by globalsessionId and uid, maybe it works!

0 Karma

bowesmana
SplunkTrust
SplunkTrust

There are various ways to collate parts of a transaction - Splunk has a 'transaction' command, but please try to avoid it - it has a number of limitations - you can almost always achieve the same thing using the 

| stats values(*) as * by X Y Z

syntax and then do your field processing after that.

You should look at the multi-value eval operations. Yes you can join MV values together (mvjoin), so after the list(transId) as transIds, do

| eval transIdsAsString=mvjoin(transIds,"")

You can also 'find' certain values (mvfind). By getting an index number back from mvfind("D") and mvfind("A") you can then compare the index values to make sure D_index is > A_index.

Also there is the mvmap() eval function, which allows you to iterate through values in an MV field (>=Splunk 8).

Note that the stats list() will add ALL values, including duplicates and stats values() will filter duplicates and SORT the values, so be aware of that.

A lot will depend on what your data looks like

 

0 Karma

Jackiifilwhh
Path Finder

Thank you so much! It's so close to the result that we want when I use list(). It can work well when the size is less than 100. But we have many scenarios where the number of results is greater than 100. Maybe we can use values() further to judge if it is abnormal when we reach the limit of list() . But I still want to know if there is a solution that can fix the problem perfectly?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

The challenge with values() is that if there are any values that are the same, they will be removed and also values() will sort the results, so it will not preserve order in the multi-value field, so you will not be able to determine the MV position as the order of operation. You will then need to consider time.

Without knowing your data better, it's hard to propose a solution. 

If you are really only interested in a subset of the actions performed under the same session Id, you could collect those actions only with stats list(), and avoid the 100 limit.

For example

| eval important_action=if(action=1 OR action=2 OR action=3, 1, 0)
| stats list(eval(if(important_action=1, transId, null()))) as transIds
        list(eval(if(important_action=1, gsn, null()))) as gsns
        list(eval(if(important_action=1, _time, null()))) as times ...

which would collection only those action=1, 2, or 3 attributes in the list

 

0 Karma

Jackiifilwhh
Path Finder

Thanks a lot! Finally I fix it! The core command is

| eval lux=_time+"|"+transId
| stats values(lux) as transChain by ...

The key is to make it unique by combining _time and trnasId together. 

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...