Knowledge Management

How to identify unused data in order to dynamically disable inputs that are not being used?

Explorer

I need a way to find unused data and disable the input/scheduledsearch/summaryindex in order to maintain cost of the Splunk environment. I figured out how to pull the sourcetypes from the access logs as well as from the saved searches. Now i just need to figure out a way to dynamically disable the inputs.

Try this bash script

START: #!/bin/sh

author Kyle_Jackson

This pulls from splunk btool savedsearches.conf and writes the results to /opt/isv/splunkshared/var/run/splunk

VERSION 2

echo "name, search" > /opt/isv/splunkshared/var/run/splunk/savedsearcheslist.csv && /opt/isv/splunk/bin/splunk btool savedsearches list | grep '[|search =|disabled =' | grep -v autosummarize.command | sed -e 's/"//g' -e "s/'//g" | sed 's/^[/BOOGER/' | tr -d '\r\n' | sed 's/BOOGER/\n/g' | sed -e 's/^/"/' -e 's/]/", "/' -e 's/$/"/' >> /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv
END:
Using this lookup I will combine it to the data sets we are using in scheduled searches below.

Scheduled Search Audit Lookup

*Search that recursively pulls the search or saved search from the bash script output and pulls out the source type to then join it to the web_access logs. *

index=internal source=*licenseusage.log type=Usage pool=defaultpool earliest=-1d@d latest=@d | eval GB=b/1024/1024/1024 | stats sum(GB) as GB by st | stats avg(GB) as avgGB by st | rename st as "sourcetype" | rename avgGB as "Avg GB/Day" | eval sourcetype=lower(sourcetype)
| join type=outer sourcetype [
| inputcsv savedsearches
list
| rename search as search1
| rex field=search1 mode=sed "s/\"//g" | rex field=search1 "index=(?\S+)" | rex field=search1 "sourcetype=(?\S+)" | rex field=search1 "source=(?\S+)"
| rex field=search1 "savedsearch (?\S+)" | rex field=search1 "disabled = (?\d)search"
| join type=outer name1 [
| inputcsv savedsearches_list
| rename name as name1
| rename search as search2
| rex field=search2 mode=sed "s/\"//g" | rex field=search2 "index=(?\S+)" | rex field=search2 "sourcetype=(?\S+)" | rex field=search2 "source=(?\S+)"
| rex field=search2 "disabled = (?\d)search"
| table name1 search2 index* source* disable*
]
| table name* index* source* search* disable*
| eval index="" | eval disabled="" | eval source="" | eval sourcetype="" | eval name=""
| eval index=case(isnull(index1), index2, isnull(index2), index1)
| eval disabled=case(isnull(disabled1), disabled2, isnull(disabled2), disabled1)
| eval source=case(isnull(source1), source2, isnull(source2), source1)
| eval sourcetype=case(isnull(sourcetype1), sourcetype2, isnull(sourcetype2), sourcetype1)
| table index source sourcetype disabled
| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| eval sourcetype=lower(sourcetype)
]

| join type=outer sourcetype [search
index=internal sourcetype=* tag::host=webserver sourcetype earliest=-30d@d latest=now referer=* | eval decode=urldecode(referer) | rex field=decode mode=sed "s/\"//g" | rex field=decode mode=sed "s/&/ /g" | rex field=decode mode=sed "s/\%/ /g"
| stats latest(
time) as time by user decode referer
| rex field=decode "index=(?\S+)" | rex field=decode "sourcetype=(?\S+)" | rex field=decode "source=(?\S+)"
| fillnull value="n/a"
| eval sourcetype=lower(sourcetype)

| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| stats latest(user) as latest
user latest(time) as latesttime by sourcetype
| eval webaccessed="latestuser=".latestuser." latesttime=".latesttime." sourcetype=".sourcetype | fields - latesttime latest_user
]

| rename disabled as is_scheduled

| eval value=case((isnull(webaccessed) AND isnull(isscheduled)), "not being searched", (isnull(webaccessed) AND isnotnull(isscheduled)), "saved search only", (isnotnull(webaccessed) AND isnull(isscheduled)), "web search only", (isnotnull(webaccessed) AND isnotnull(isscheduled)), "both saved and web searched")
| fields - index source is_scheduled
| sort - value, "Avg GB/Day"

Please let me know if there is a better way to do this with a **possible solution** to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.

Splunk Employee
Splunk Employee

I see a possible limit to your approach.
The license usage only contains metadata like source/sourcetype/host/index.
You can eventually figure that a precise search (like "index=A sourcetype=B" ) can be used to match the license usage.
But for searches with broad conditions (like "index=*" or "keyword" ), you will not be able to know what was the scope of the data,