Archive

Identify which data is useful

dominiquevocat
Builder

Is there an app or some documented searches that can help identify things like

  • Indexes frequently searched
  • Users who frequently use splunk (searches)
  • score "useful" data by volume of data and amount/frequency of search against the data (probably bubble chart useful to show it)

Reason is that our management wants to know how useful a given source/index/sourcetype is as when matching it against the cost is causes.

Is there a app that clasifies data in these dimentions?

Tags (1)
0 Karma

acharlieh
Influencer

The second one you can get from the _audit index. Offhand something like:

index=_audit action=search info=granted search_id=*

(I could be slightly off in my parameters here without a Splunk instance in front of me, but you get the idea) these could also tell you if they're scheduled or ad hoc searches. But also take this with a grain of salt since if a user is kicking off more simultaneous searches than they're allowed to by their role, (Especially with bad cron scheduling) the number of searches returned this way could go through the roof so check for uniqueness of queries too.

The first one I've been asking for an index access log by search for a while now. As you have the searches and the users who kicked them off from you could do a bunch of manual work to figure out which indexes qualified with each search.

How useful particular data specifically in comparison to costs however you should think in terms of what did you save the business by having this data available through Splunk, not really how often was it searched. For that you need to talk to various teams using Splunk and collect their war stories. If they know the searches or dashboards that helped them this could drive back to what data is qualified. Talk to your Sales Rep or Sales Engineer too, I've seen a business value toolkit that they may be able to share.

Another thing that's worked for us, if your teams are split by index use the license master to track volumes per index. Engage the largest teams to see what value they get out of their logs. Even an alert that's triggered when 1/24th of the license (or other arbitrary limit) is used in the previous hour and which indexes were the most helped us engage teams to ask what was happening and while some were the result of ongoing incidents some went "oh wait... Why are we logging that?"

0 Karma

dominiquevocat
Builder

I will look into it. Currently I take the SoS report "Recent Usage by User (Non-Scheduled Only)" and enrich the userid with data from our directory and tally up with businessunit/country to identify heavy users. The other thing would be to identify the index in the search string... I also can gather the amount of data indexed daily. So with these two metrics i can probably gather some kind of "searchdensity". Not sure if i am being clear here.

0 Karma

dominiquevocat
Builder

ok, so far i did a field extraction that kinda works:
audittrail : EXTRACT-idx Inline index=(?P<idx>\w+)
and i can do stats on that.

0 Karma

dominiquevocat
Builder

like so index=audit action=search info=granted search_id=* idx=* idx!="" | stats count(idx) by idx

0 Karma