Hi,
We want to track our Top N users of license by index, and then compare it to yesterday (and possibly alert on major changes). Not sure how to do that... any suggestions?
See my comment under your question. With that understanding in mind:
While you can use other sources within the _internal index, the real challenge comes in when the search history never contained a reference to the index. For example, how do you discern the index searched if it wasn’t part of the search query? For example, tags, eventtypes, datamodels, and macros would show search strings in the audit that don’t expand the definitions of those knowledge objects.
Sorry to be a buzzkill. I’ve seen this question come up a number of times with varying answers depending on the desired goal.
You might be able to use joins and lookups to correlate tags or macros (etc) back to the their definitions BUT those definitions change over time and there might be circular references (tags that call macros that call tags - even if different ones) so there's some challenges in how to write such a search. Perhaps the community has another approach?
To clarify, I think @212830 is looking for determining if data indexed by Splunk is actually getting "used" where "used" is defined as searched by a variety of users or by quantity of searches (but maybe less users). This goes to determining the value the indexed data is bringing.
This might be a little off topic but instead of using an index by user to see how valuable the data is, I use a dashboard by user query.
index="_internal" source=/opt/splunk/var/log/splunk/splunkd_ui_access.log "en-US/app" | rex "(?i)^[^\-]*\-\s+(?P<user>[^ ]+)" | rex "en-US/app/(?<app>[^/]+)/(?<dashboard>[^?/\"\s]+)"| search app="search" dashboard!="job_management" dashboard!="dbinfo" dashboard!="*en-US" dashboard!="search" dashboard!="home" dashboard!="alert*" dashboard!="dashboards" dashboard!="reports" dashboard!="report" dashboard!="@go" dashboard!="%40go" dashboard!="field_extractor" dashboard!="pivot" dashboard!="test2" dashboard!="test3" dashboard!="test5" user!="-"|bucket _time span=1d | stats dc(dashboard) as c by dashboard user _time | chart useother=f limit=0 count by dashboard user|sort -count|rename dashboard as "Dashboard Name"
If you just need by index this query works
index=_internal source="*license_usage.log*"|where idx!="NULL"| timechart limit=0 span=1d sum(eval(round(b/1024/1024/1024,5))) by idx
If you can track indexing by hosts instead of user, then you can use the built-in usage data with the following:
index=_internal source=*license_usage.log* type="Usage" idx=<YOUR_INDEX>
| timechart span=1d limit=0 eval(round(sum(b)/1024/1024,3)) as MB by h
If you need to track indexing by user, then you would have to look at the raw data lengths.
index=<YOUR_INDEX>
| timechart span=1d limit=0 sum(eval(len(_raw)/1024/1024)) by <USER_FIELD>
Without getting into worrying about days of the week and patterns of behavior, you can then compare between days with the following, where you will see where there is a 50% change in volume
| untable _time user mb
| streamstats window=1 current=f last(mb) as prev_mb by user
| eval perc_diff = (abs(prev_mb - mb)/mb)*100
| search perc > 50 prev_mb > 0
Sorry, user has nothing to do with, I want by index. We'll associate the index to a bu internally via dbx.
To see how much each source is putting through, try this search- change the sourcetype as needed.
sourcetype=WinEventLog:*
| fields _raw, _time, host
| eval evt_bytes = len(_raw)
| timechart span=1d sum(eval(evt_bytes/1024/1024)) AS TotalMB by host
To see how big your indexes are, try this search:
| rest /services/data/indexes | eval perc=(currentDBSizeMB * 100 / maxTotalDataSizeMB ) | table title currentDBSizeMB maxTotalDataSizeMB perc
I used to do it this way but recently learned that this won't be 100% accurate because
len
and the license counter measure the same (they don't, len
measures characters while the license counter measures bytes)_indextime
is not the same as _time
. Sometimes forwarders get backed up and an item may be indexed some time after what it's _time
value is.Can you please clarify what you mean in your original post by "Top N users by index"? What do you mean "users"?
Just topn index by volume. We'll associate the index name internally with a bu.
Ok then, how about his? Your "N" can be set by changing the value "limit=N" in the timechart
command.
index=_internal source=*license_usage.log* type="Usage"
| timechart span=1d limit=10 eval(round(sum(b)/1024/1024,3)) as mb by idx
| untable _time idx mb
| streamstats window=1 current=f last(mb) as prev_mb by idx
|eval perc_diff = (abs(prev_mb - mb)/mb)*100
| eval marker = if(perc > 50 AND prev_mb > 0, 1, 0)
That query should give you the field marker the day when a given index changed by 50% percent or more as compared to the previous day. You could overlay that information with a chart show the daily totals, or you could filter down to just that field and setup alerts.
Hi A212830,
This query worked for me. It doesn't compare to yesterday but you could easily do that with the timewrap app.
index=_internal source="*license_usage.log*"|where idx!="NULL"| timechart limit=0 span=1d sum(eval(round(b/1024/1024/1024,5))) by idx
Whups sorry that doesn't do the user part. Lemme dig on that