<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Anomaly detection in Splunk in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575577#M200563</link>
    <description>&lt;P class=""&gt;Hi all,&lt;/P&gt;&lt;P class=""&gt;I am new to Splunk and have been trying to work on a use case to detect anomalous switches from one type of account to another.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Index A&lt;/STRONG&gt;: Has the list of switches i.e. has two columns: 'Old account', 'New account'.&lt;BR /&gt;&lt;STRONG&gt;Index B&lt;/STRONG&gt;: Has the *type* of accounts. It has two columns: 'Accounts', 'Account_types'.&lt;/P&gt;&lt;P class=""&gt;Till now, using commands like join (after renaming certain columns), I have been able to get to a point where I have a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;table of 4 columns, 'Old account', 'Old_account_type', New account', 'New_account_type'.&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Aim&lt;/STRONG&gt;:&lt;BR /&gt;I need to implement logic to detect if old accounts switch to 'unusual' new accounts**.**&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Idea so far&lt;/STRONG&gt;:&lt;BR /&gt;I wish to create a dictionary of some sort where there is a list of new accounts and new_account_type(s) an old account has switched to. And then, if the old account switches to an account not in this dictionary, I wish to flag it up. Does this sound like a logical idea?&lt;/P&gt;&lt;P class=""&gt;For example, if looking at past 4 switches, if an old account named A of the type 'admin', switches to new accounts named 1, 2, 3, 4 of type admin, user, admin, admin, then the dictionary should look like&lt;BR /&gt;A_switches = {&lt;BR /&gt;"Old Account": "A",&lt;BR /&gt;"old_account_type":"admin",&lt;BR /&gt;"New Account": [1 , 2 , 3, 4],&lt;BR /&gt;"type": [admin, user]&lt;BR /&gt;}&lt;/P&gt;&lt;P class=""&gt;This query needs to be run each hour to flag up unusual switches. Can someone suggest how I can implement the above logic i.e. create a dictionary and spot unusual activity?&lt;/P&gt;&lt;P class=""&gt;Apologies for the long question and if something isn't clear.&lt;/P&gt;</description>
    <pubDate>Fri, 19 Nov 2021 01:47:39 GMT</pubDate>
    <dc:creator>axm1295</dc:creator>
    <dc:date>2021-11-19T01:47:39Z</dc:date>
    <item>
      <title>Anomaly detection in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575577#M200563</link>
      <description>&lt;P class=""&gt;Hi all,&lt;/P&gt;&lt;P class=""&gt;I am new to Splunk and have been trying to work on a use case to detect anomalous switches from one type of account to another.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Index A&lt;/STRONG&gt;: Has the list of switches i.e. has two columns: 'Old account', 'New account'.&lt;BR /&gt;&lt;STRONG&gt;Index B&lt;/STRONG&gt;: Has the *type* of accounts. It has two columns: 'Accounts', 'Account_types'.&lt;/P&gt;&lt;P class=""&gt;Till now, using commands like join (after renaming certain columns), I have been able to get to a point where I have a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;table of 4 columns, 'Old account', 'Old_account_type', New account', 'New_account_type'.&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Aim&lt;/STRONG&gt;:&lt;BR /&gt;I need to implement logic to detect if old accounts switch to 'unusual' new accounts**.**&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Idea so far&lt;/STRONG&gt;:&lt;BR /&gt;I wish to create a dictionary of some sort where there is a list of new accounts and new_account_type(s) an old account has switched to. And then, if the old account switches to an account not in this dictionary, I wish to flag it up. Does this sound like a logical idea?&lt;/P&gt;&lt;P class=""&gt;For example, if looking at past 4 switches, if an old account named A of the type 'admin', switches to new accounts named 1, 2, 3, 4 of type admin, user, admin, admin, then the dictionary should look like&lt;BR /&gt;A_switches = {&lt;BR /&gt;"Old Account": "A",&lt;BR /&gt;"old_account_type":"admin",&lt;BR /&gt;"New Account": [1 , 2 , 3, 4],&lt;BR /&gt;"type": [admin, user]&lt;BR /&gt;}&lt;/P&gt;&lt;P class=""&gt;This query needs to be run each hour to flag up unusual switches. Can someone suggest how I can implement the above logic i.e. create a dictionary and spot unusual activity?&lt;/P&gt;&lt;P class=""&gt;Apologies for the long question and if something isn't clear.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 01:47:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575577#M200563</guid>
      <dc:creator>axm1295</dc:creator>
      <dc:date>2021-11-19T01:47:39Z</dc:date>
    </item>
    <item>
      <title>Re: Anomaly detection in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575600#M200578</link>
      <description>&lt;P&gt;Would this gather the information you need?&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| stats values(new_account) as new_accounts values(new_account_type) as new_account_types by old_account old_account_type&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 19 Nov 2021 08:53:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575600#M200578</guid>
      <dc:creator>ITWhisperer</dc:creator>
      <dc:date>2021-11-19T08:53:22Z</dc:date>
    </item>
    <item>
      <title>Anomaly detection in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575607#M200581</link>
      <description>&lt;P&gt;Before tackling the question of anomaly, I made the following simulations in order to clarify premises. &amp;nbsp;Please let me know if my understanding of the problem is correct:&amp;nbsp;indexA is a regular audit log that contains an ID that cannot be changed, account names associated with each ID that can change over time; each of these account names have an associated type that is stored in indexB; only the latest association is valid. &amp;nbsp;Hence,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;``` simulation of indexA ```
| makeresults
| eval inc = mvrange(0, 4)
| mvexpand inc
| eval _time=_time + inc*3600
| eval index = "index1", id = "userA", account = case(inc==0, "john", inc==1, "jeff", inc==2, "joe", inc==3, "jack”)
| fields - inc&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;_time&lt;/TD&gt;&lt;TD&gt;account&lt;/TD&gt;&lt;TD&gt;id&lt;/TD&gt;&lt;TD&gt;index&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-11-19 00:27:02&lt;/TD&gt;&lt;TD&gt;john&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;indexA&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-11-19 01:27:02&lt;/TD&gt;&lt;TD&gt;jeff&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;indexA&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-11-19 02:27:02&lt;/TD&gt;&lt;TD&gt;joe&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;indexA&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2021-11-19 03:27:02&lt;/TD&gt;&lt;TD&gt;jack&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;indexA&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Note &amp;nbsp;the above depicts progression of a single ID userX over a period of 4 hours.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;``` simulation of indexB ```
| makeresults
| eval john="admin", jack="user", jeff="user", joe="robot", jim="admin"
| stats values(j*) as j*
| transpose
| eval _time = now(), index = "indexB"
| rename column as account, "row 1" as type&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;account&lt;/TD&gt;&lt;TD&gt;type&lt;/TD&gt;&lt;TD&gt;_time&lt;/TD&gt;&lt;TD&gt;index&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jack&lt;/TD&gt;&lt;TD&gt;user&lt;/TD&gt;&lt;TD&gt;2021-11-19 00:32:27&lt;/TD&gt;&lt;TD&gt;indexB&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jeff&lt;/TD&gt;&lt;TD&gt;user&lt;/TD&gt;&lt;TD&gt;2021-11-19 00:32:27&lt;/TD&gt;&lt;TD&gt;indexB&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jim&lt;/TD&gt;&lt;TD&gt;admin&lt;/TD&gt;&lt;TD&gt;2021-11-19 00:32:27&lt;/TD&gt;&lt;TD&gt;indexB&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;joe&lt;/TD&gt;&lt;TD&gt;robot&lt;/TD&gt;&lt;TD&gt;2021-11-19 00:32:27&lt;/TD&gt;&lt;TD&gt;indexB&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;john&lt;/TD&gt;&lt;TD&gt;admin&lt;/TD&gt;&lt;TD&gt;2021-11-19 00:32:27&lt;/TD&gt;&lt;TD&gt;indexB&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;The above depicts the latest snapshot of the account table, therefore a single timestamp.&lt;/P&gt;&lt;P&gt;If the two assumptions look correct, I can think of two ways to combine the data, based on the labels you put in the question. &amp;nbsp;First is KV lookup: you can dump the latest account table from indexB into a lookup table, then use lookup to associate accounts in indexA with types.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
| eval john="admin", jack="user", jeff="user", joe="robot", jim="admin"
| stats values(j*) as j*
| transpose
| eval _time = now(), index = "index2"
| rename column as account, "row 1" as type

``` dump latest snapshot of indexB into lookup ```
| dedup account type
| fields - _time
| outputlookup accounttype.csv&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then, use this lookup in indexA&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
| eval inc = mvrange(0, 4)
| mvexpand inc
| eval _time=_time + inc*3600
| eval index = "index1", id = "userA", account = case(inc==0, "john", inc==1, "jeff", inc==2, "joe", inc==3, "jack”)
| fields - inc

``` lookup accounttype.csv ```
| lookup accounttype.csv account&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This method requires the lookup table to be maintained up to date. &amp;nbsp;This can be a disadvantage.&lt;/P&gt;&lt;P&gt;Meanwhile, since you are also considering &lt;U&gt;join&lt;/U&gt;, let me illustrate a different method, &lt;U&gt;append&lt;/U&gt;, which is less expensive. (Many thanks to bowesmana who recently reminded me of this trick.)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
``` simulate index1```
| eval inc = mvrange(0, 4)
| mvexpand inc
| eval _time=_time + inc*3600
| eval index = "indexA", id = "userX", account = case(inc==0, "john", inc==1, "jeff", inc==2, "joe", inc==3, "jack")
| fields - inc

``` less expensive than join ```
| append
    [
    ``` simulat index2 ```
    | makeresults
    | eval john="admin", jack="user", jeff="user", joe="robot", jim="admin"
    | stats values(j*) as j*
    | transpose
    | eval _time = now(), index = "indexB"
    | rename column as account, "row 1" as type

    ``` only use latest, not interested in _time in indexB ```
    | dedup account type
    | fields - _time]

``` associate account type with account ```
| fields - index
| stats values(*) as * values(_time) as _time by account
| where isnotnull(id) ``` only necessary in this simulation ```
| sort _time&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This gives you&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;account&lt;/TD&gt;&lt;TD&gt;id&lt;/TD&gt;&lt;TD&gt;type&lt;/TD&gt;&lt;TD&gt;_time&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;john&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;admin&lt;/TD&gt;&lt;TD&gt;2021-11-19 01:07:50&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jeff&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;user&lt;/TD&gt;&lt;TD&gt;2021-11-19 02:07:50&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;joe&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;robot&lt;/TD&gt;&lt;TD&gt;2021-11-19 03:07:50&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;jack&lt;/TD&gt;&lt;TD&gt;userX&lt;/TD&gt;&lt;TD&gt;user&lt;/TD&gt;&lt;TD&gt;2021-11-19 04:07:50&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;With simulated data, userX's earliest account:type is john:admin, then it becomes jeff:user, then joe:user, and as of late, jack:user; in account type alone, the progression is admin-&amp;gt;user-&amp;gt;robot-user.&lt;/P&gt;&lt;P&gt;Note I added a filter "| where isnotnull(id)" because account jim in indexB is not associated with userX, making a funny display in the end result. &amp;nbsp;In the real world, an account is always associated with an id, the funny display should not happen.&lt;/P&gt;&lt;P&gt;Also, wildcard "values(*) as *" &amp;nbsp;does not work on _time, hence "values(_time) as _time" is necessary.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 09:32:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Anomaly-detection-in-Splunk/m-p/575607#M200581</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-11-19T09:32:13Z</dc:date>
    </item>
  </channel>
</rss>

