I am attempting to write a search that can alert if a user deviates from some normal data viewing pattern. The event log in question records every time a user sees a bit of information, identified by the cID. Sometimes they view the same cID multiple times per day, but I only care about the distinct number they view in some time period. Ultimately, I would like to determine the average number of unique cIDs each user views over some time period (maybe daily, maybe weekly) so that I can look for exceptions and trigger an alert automatically.
So if userA views 150 unique cIDs on average each day (over a 30 day span), and one day they view 400 unique cIDs, I would like an alert to be triggered. I have looked at the "anomalies", "delta", and "outlier" commands, but can't seem to get a working search. I am working on a search that takes the avg(dc(cID)) by username, but that seems to be a dead end due to some Splunk restrictions. I'm not set on using avg() as the determining parameter, I just need something that can detect anomalous behavior.
Anyone have a better approach?
I would summary index the distinct count of cID values and make sure the user field is also indexed. From there, you should be able to run a "stats range" search against the cID returned which will give you the daily difference. Finally, run a search against the output of the "stats range" that is greater than the level you want to trigger upon. So in search language, maybe this:
Save this search to summary index every night (also save the count_cID as a field):
sourcetype=event_log | sistats dc(cID) as count_cID by user
Run this search every 24+ hours to check the change (using a difference of +-100:
index=summary search_name=<above_saved_search> | stats range(count_cID) as cID_change by user | search cID_change > 100
I'm first tried the search you suggested, but have now tried index=summary search_name="Summary - cID by username" | stats max(psrsvd_ct_sec_cardID) by username. This gives me 378 results in the last 7 days, but the table generated only lists the username and no data for the max(field).
Simeon - The field settings appear to store a static value, rather than allowing me to name the field that the dc(cID) is stored in and I can't seem to actually report on the data. I see that a field "psrsvd_ct_cID" is populated with the relevant data, but I can't actually chart/report on it? Any thoughts or should I open a splunk trouble ticket at this point? I appreciate your help.
Yes, you are correct... you may have to modify your sistats to be something like:
| sistats dc(cID) as count_cid
AND you will need to save count_cid in the field settings for the summary index.
I recommend you read the summary indexing documentation before doing the above. To directly answer your question, you would need to enable the summary indexing via the check box in the saved search and you should specify the completed days range (-1d@d would give you the past whole day).
Yes. A search with
sistats (just like a search with plain
stats) needs to setup to enable summary indexing. (The "si" prefix commands don't magically feed any data to the summary index. They are just indented to be more summary-index friendly commands.)
That works to get the current average for the timeframe, but I need to compare it to the most recent day's count to know if I need to generate an alert. So if I take the average of the last 8 days (earliest=-8d@d latest=-2d@d) I need to compare that average to the DC from earliest=-1d@d so that I can determine the diff from normal.
Thanks - I'll look into those. The problem I hit with most of these commands is that I am trying to apply them to distinct_count(cID), rather than take the average or trendline of the cID values themselves. cID are unique identifiers, so they have no numeric meaning.