Solved: How can I search abnormal user behaviour?

Jackiifilwhh · ‎04-13-2022

Background information

In our system, every visit consists of one or more actions. Every action has its own name and in Splunk it's a field named "transId". Every time an action is triggered, it has a unique sequence and in Splunk it's a field named "gsn". A customer has his unique id and in Splunk it's a field named "uid". During the period of a customer visit our system, he has a unique session id and in Splunk it's a field named "sessionId". If we want to locate a complete operation of a user, we need to use uid and sessionId together. Like many other systems, the order of actions in our system is fixed, under normal circumstances.

What we want

We want to create an alter to monitor the abnormal order of actions. For example, an important action named "D", it is at the end of an action-chain. Under normal circumstances, you must access our system by the order of actions "A B C D". But some hackers may skip the trans B, which may be an action that verify his identity. The problem is that I don't know the command to get abnormal results. We can accept that we need to input the order of actions for every action-chain. It's better to read the order by configuration file.

What I've tried

| stats count by sessionId uid transId gsn _time
| sort 0 sessionId uid _time

I can get every use's order of actions by this command.

Can you give me some advice? If you want to get more information, you can ask me here.

Best wishes!

bowesmana · ‎04-18-2022

The challenge with values() is that if there are any values that are the same, they will be removed and also values() will sort the results, so it will not preserve order in the multi-value field, so you will not be able to determine the MV position as the order of operation. You will then need to consider time.

Without knowing your data better, it's hard to propose a solution.

If you are really only interested in a subset of the actions performed under the same session Id, you could collect those actions only with stats list(), and avoid the 100 limit.

For example

| eval important_action=if(action=1 OR action=2 OR action=3, 1, 0)
| stats list(eval(if(important_action=1, transId, null()))) as transIds
        list(eval(if(important_action=1, gsn, null()))) as gsns
        list(eval(if(important_action=1, _time, null()))) as times ...

which would collection only those action=1, 2, or 3 attributes in the list

View solution in original post

PickleRick · ‎04-14-2022

There are many approaches to such case.

Firstly, there is a whole product dedicated to Users' Behaviour Analytics. But that costs noticeable amount of money and I suppose you're not interested in that.

You can use the list and multivalued fields approach as @bowesmana already showed.

You can also simply sort your data, then - for example - use streamstats to count consecutive operations of the same type and only after that filter it and combine into a single list (this way you could remove - for example - multiple steps of the same time; like adding many different items to a basket befor checkout).

Jackiifilwhh · ‎04-14-2022

Thank you so much! Could you tell me the name of the product you mentioned above and maybe I can take a brief look at it. And I will try streamstats , think like you said.

PickleRick · ‎04-14-2022

Splunk UBA

https://www.splunk.com/en_us/software/user-behavior-analytics.html

bowesmana · ‎04-13-2022

You asked this question already, but I reply to this one.

This base stats command can give you a list of transIds, times and gsn for each session and uid

| stats list(transId) as transIds list(_time) as times list(gsn) as gsn by sessionId uid

Note that using list will give all results from the ones it finds up to a maximum of 100 results - if you can exceed that, list will not work.

However, at that point you will have all actions by a user within a session.

Then you can start to determine 'abnormal'. If your sequence of actions is fixed and well known, then it is fairly easy to look at each of the values in the transIds field, so for example if they skip B (second action), then you would have A,C,D and this test would highlight that

| where mvcount(transIds)>1 AND mvindex(transIds, 1)!="ActionB"

or if your last action should be D and there should be 4 actions, then

| where mvcount(transIds)=4 AND mvindex(transIds, 3)!="ActionD"

These logic tests are things you will need to define based on your data

Jackiifilwhh · ‎04-13-2022

Thanks a lot! I will try these command and think like this. But the length of sequence of actions is not fixed. We want to find the result when action D appears, it must have A B C before D, but the whole action-chain can be Z X C A B C D G J L I. If there a command can transfer all transId field into a string or other value by globalsessionId and uid, maybe it works!

bowesmana · ‎04-13-2022

There are various ways to collate parts of a transaction - Splunk has a 'transaction' command, but please try to avoid it - it has a number of limitations - you can almost always achieve the same thing using the

| stats values(*) as * by X Y Z

syntax and then do your field processing after that.

You should look at the multi-value eval operations. Yes you can join MV values together (mvjoin), so after the list(transId) as transIds, do

| eval transIdsAsString=mvjoin(transIds,"")

You can also 'find' certain values (mvfind). By getting an index number back from mvfind("D") and mvfind("A") you can then compare the index values to make sure D_index is > A_index.

Also there is the mvmap() eval function, which allows you to iterate through values in an MV field (>=Splunk 8).

Note that the stats list() will add ALL values, including duplicates and stats values() will filter duplicates and SORT the values, so be aware of that.

A lot will depend on what your data looks like

Jackiifilwhh · ‎04-14-2022

Thank you so much! It's so close to the result that we want when I use list(). It can work well when the size is less than 100. But we have many scenarios where the number of results is greater than 100. Maybe we can use values() further to judge if it is abnormal when we reach the limit of list() . But I still want to know if there is a solution that can fix the problem perfectly?

bowesmana · ‎04-18-2022

The challenge with values() is that if there are any values that are the same, they will be removed and also values() will sort the results, so it will not preserve order in the multi-value field, so you will not be able to determine the MV position as the order of operation. You will then need to consider time.

Without knowing your data better, it's hard to propose a solution.

If you are really only interested in a subset of the actions performed under the same session Id, you could collect those actions only with stats list(), and avoid the 100 limit.

For example

| eval important_action=if(action=1 OR action=2 OR action=3, 1, 0)
| stats list(eval(if(important_action=1, transId, null()))) as transIds
        list(eval(if(important_action=1, gsn, null()))) as gsns
        list(eval(if(important_action=1, _time, null()))) as times ...

which would collection only those action=1, 2, or 3 attributes in the list

Jackiifilwhh · ‎04-25-2022

Thanks a lot! Finally I fix it! The core command is

| eval lux=_time+"|"+transId
| stats values(lux) as transChain by ...

The key is to make it unique by combining _time and trnasId together.

How can I search abnormal user behaviour?

Background information

What we want

What I've tried

eval

field extraction

regex

rex

stats

Data Management Digest – November 2025

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Are you a member of the Splunk Community?

How can I search abnormal user behaviour?

Background information

What we want

What I've tried