Anomaly detection in Splunk

axm1295 — Fri, 19 Nov 2021 01:47:39 GMT

Hi all,

I am new to Splunk and have been trying to work on a use case to detect anomalous switches from one type of account to another.

Index A: Has the list of switches i.e. has two columns: 'Old account', 'New account'.
Index B: Has the *type* of accounts. It has two columns: 'Accounts', 'Account_types'.

Till now, using commands like join (after renaming certain columns), I have been able to get to a point where I have a table of 4 columns, 'Old account', 'Old_account_type', New account', 'New_account_type'.

Aim:
I need to implement logic to detect if old accounts switch to 'unusual' new accounts**.**

Idea so far:
I wish to create a dictionary of some sort where there is a list of new accounts and new_account_type(s) an old account has switched to. And then, if the old account switches to an account not in this dictionary, I wish to flag it up. Does this sound like a logical idea?

For example, if looking at past 4 switches, if an old account named A of the type 'admin', switches to new accounts named 1, 2, 3, 4 of type admin, user, admin, admin, then the dictionary should look like
A_switches = {
"Old Account": "A",
"old_account_type":"admin",
"New Account": [1 , 2 , 3, 4],
"type": [admin, user]
}

This query needs to be run each hour to flag up unusual switches. Can someone suggest how I can implement the above logic i.e. create a dictionary and spot unusual activity?

Apologies for the long question and if something isn't clear.

Re: Anomaly detection in Splunk

ITWhisperer — Fri, 19 Nov 2021 08:53:22 GMT

Would this gather the information you need?

| stats values(new_account) as new_accounts values(new_account_type) as new_account_types by old_account old_account_type

Anomaly detection in Splunk

yuanliu — Fri, 19 Nov 2021 09:32:13 GMT

Before tackling the question of anomaly, I made the following simulations in order to clarify premises. Please let me know if my understanding of the problem is correct: indexA is a regular audit log that contains an ID that cannot be changed, account names associated with each ID that can change over time; each of these account names have an associated type that is stored in indexB; only the latest association is valid. Hence,

_time	account	id	index
2021-11-19 00:27:02	john	userX	indexA
2021-11-19 01:27:02	jeff	userX	indexA
2021-11-19 02:27:02	joe	userX	indexA
2021-11-19 03:27:02	jack	userX	indexA

Note the above depicts progression of a single ID userX over a period of 4 hours.

account	type	_time	index
jack	user	2021-11-19 00:32:27	indexB
jeff	user	2021-11-19 00:32:27	indexB
jim	admin	2021-11-19 00:32:27	indexB
joe	robot	2021-11-19 00:32:27	indexB
john	admin	2021-11-19 00:32:27	indexB

The above depicts the latest snapshot of the account table, therefore a single timestamp.

If the two assumptions look correct, I can think of two ways to combine the data, based on the labels you put in the question. First is KV lookup: you can dump the latest account table from indexB into a lookup table, then use lookup to associate accounts in indexA with types.

Then, use this lookup in indexA

This method requires the lookup table to be maintained up to date. This can be a disadvantage.

Meanwhile, since you are also considering join, let me illustrate a different method, append, which is less expensive. (Many thanks to bowesmana who recently reminded me of this trick.)

This gives you

account	id	type	_time
john	userX	admin	2021-11-19 01:07:50
jeff	userX	user	2021-11-19 02:07:50
joe	userX	robot	2021-11-19 03:07:50
jack	userX	user	2021-11-19 04:07:50

With simulated data, userX's earliest account:type is john:admin, then it becomes jeff:user, then joe:user, and as of late, jack:user; in account type alone, the progression is admin->user->robot-user.

Note I added a filter "| where isnotnull(id)" because account jim in indexB is not associated with userX, making a funny display in the end result. In the real world, an account is always associated with an id, the funny display should not happen.

Also, wildcard "values(*) as *" does not work on _time, hence "values(_time) as _time" is necessary.

topic Anomaly detection in Splunk in Splunk Search

Anomaly detection in Splunk

Re: Anomaly detection in Splunk

Anomaly detection in Splunk