author Kyle_Jackson

Kyle_Jackson · ‎01-06-2015

I need a way to find unused data and disable the input/scheduled_search/summary_index in order to maintain cost of the Splunk environment. I figured out how to pull the sourcetypes from the access logs as well as from the saved searches. Now i just need to figure out a way to dynamically disable the inputs.

Try this bash script

START: #!/bin/sh

author Kyle_Jackson

This pulls from splunk btool savedsearches.conf and writes the results to /opt/isv/splunkshared/var/run/splunk

VERSION 2

echo "name, search" > /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv && /opt/isv/splunk/bin/splunk btool savedsearches list | grep '[|search =|disabled =' | grep -v auto_summarize.command | sed -e 's/"//g' -e "s/'//g" | sed 's/^[/BOOGER/' | tr -d '\r\n' | sed 's/BOOGER/\n/g' | sed -e 's/^/"/' -e 's/]/", "/' -e 's/$/"/' >> /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv
END:
Using this lookup I will combine it to the data sets we are using in scheduled searches below.

Scheduled Search Audit Lookup

*Search that recursively pulls the search or saved search from the bash script output and pulls out the source type to then join it to the web_access logs. *

index=_internal source=license_usage.log type=Usage pool=default_pool earliest=-1d@d latest=@d | eval GB=b/1024/1024/1024 | stats sum(GB) as GB by st | stats avg(GB) as avg_GB by st | rename st as "sourcetype" | rename avg_GB as "Avg GB/Day" | eval sourcetype=lower(sourcetype)
| join type=outer sourcetype [
| inputcsv savedsearches_list
| rename search as search1
| rex field=search1 mode=sed "s/\"//g" | rex field=search1 "index=(?\S+)" | rex field=search1 "sourcetype=(?\S+)" | rex field=search1 "source=(?\S+)"
| rex field=search1 "savedsearch (?\S+)" | rex field=search1 "disabled = (?\d)search"
| join type=outer name1 [
| inputcsv savedsearches_list
| rename name as name1
| rename search as search2
| rex field=search2 mode=sed "s/\"//g" | rex field=search2 "index=(?\S+)" | rex field=search2 "sourcetype=(?\S+)" | rex field=search2 "source=(?\S+)"
| rex field=search2 "disabled = (?\d)search"
| table name1 search2 index source* disable*
]
| table name* index* source* search* disable*
| eval index="" | eval disabled="" | eval source="" | eval sourcetype="" | eval name=""
| eval index=case(isnull(index1), index2, isnull(index2), index1)
| eval disabled=case(isnull(disabled1), disabled2, isnull(disabled2), disabled1)
| eval source=case(isnull(source1), source2, isnull(source2), source1)
| eval sourcetype=case(isnull(sourcetype1), sourcetype2, isnull(sourcetype2), sourcetype1)
| table index source sourcetype disabled
| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| eval sourcetype=lower(sourcetype)
]

| join type=outer sourcetype [search
index=_internal sourcetype=* tag::host=webserver sourcetype earliest=-30d@d latest=now referer=* | eval decode=urldecode(referer) | rex field=decode mode=sed "s/\"//g" | rex field=decode mode=sed "s/&/ /g" | rex field=decode mode=sed "s/\%/ /g"
| stats latest(_time) as _time by user decode referer
| rex field=decode "index=(?\S+)" | rex field=decode "sourcetype=(?\S+)" | rex field=decode "source=(?\S+)"
| fillnull value="n/a"
| eval sourcetype=lower(sourcetype)

| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| stats latest(user) as latest_user latest(_time) as latest_time by sourcetype
| eval web_accessed="latest_user=".latest_user." latest_time=".latest_time." sourcetype=".sourcetype | fields - latest_time latest_user
]

| rename disabled as is_scheduled

| eval value=case((isnull(web_accessed) AND isnull(is_scheduled)), "not being searched", (isnull(web_accessed) AND isnotnull(is_scheduled)), "saved search only", (isnotnull(web_accessed) AND isnull(is_scheduled)), "web search only", (isnotnull(web_accessed) AND isnotnull(is_scheduled)), "both saved and web searched")
| fields - index source is_scheduled
| sort - value, "Avg GB/Day"

Please let me know if there is a better way to do this with a possible solution to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.

yannK · ‎01-06-2015

I see a possible limit to your approach.
The license usage only contains metadata like source/sourcetype/host/index.
You can eventually figure that a precise search (like "index=A sourcetype=B" ) can be used to match the license usage.
But for searches with broad conditions (like "index=*" or "keyword" ), you will not be able to know what was the scope of the data,

How to identify unused data in order to dynamically disable inputs that are not being used?

Try this bash script

author Kyle_Jackson

This pulls from splunk btool savedsearches.conf and writes the results to /opt/isv/splunkshared/var/run/splunk

VERSION 2

Scheduled Search Audit Lookup

Please let me know if there is a better way to do this with a possible solution to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation

How to identify unused data in order to dynamically disable inputs that are not being used?

Try this bash script

author Kyle_Jackson

This pulls from splunk btool savedsearches.conf and writes the results to /opt/isv/splunkshared/var/run/splunk

VERSION 2

Scheduled Search Audit Lookup

Please let me know if there is a better way to do this with a **possible solution** to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Please let me know if there is a better way to do this with a possible solution to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.