Is there a way to determine which logs are not being used anymore, and therefore can be deleted? For example, maybe a team started logging something a year ago, but the team no longer uses that log for any reports/dashboards/etc... Is there a way to find these unused logs?
In addition to the links mentioned by adonio, this search might get you some of the way there. However, things like macros can hide indexes/sourcetypes so it's not 100% but does also include data models/nodenames being used.
The search filters out all "*" and "_*" references as those aren't very useful. It prefixes data models with "DM-" and nodenames with "ND-" and treats those as an index/sourcetype combo. Macros are prefixed with "MC-" to easily identify and look at manually.
You could compare this against a REST call to the indexes or indexes-extended endpoint to get a starting point. BUT, you will want to confirm with data owners the indexes aren't actually being used since, again, this search is not 100%.
index=_internal sourcetype=splunkd_remote_searches
|dedup search
|eval
search=replace(search, "(datamodel\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1DM-\2\3"),
search=replace(search, "(eval\s+datamodel\s*=[\s\"]*)DM-", "\1"),
search=replace(search, "(\|\s*pivot\s+)(.*?)(\s)", "\1DM-\2\3"),
search=replace(search, "(nodename\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1ND-\2\3"),
search=replace(search, "(eval\s+nodename\s*=[\s\"]*)ND-", "\1"),
search=replace(search, "(search\s*`)(.*?)([`\(])", "\1MC-\2\3")
|rex field=search max_match=0 "index\s*=[\s\"]*(?<idx>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "sourcetype\s*=[\s\"]*(?<st>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "search\s*`(?<macro_index>MC-.*?)[`\(]"
|rex field=search max_match=0 "datamodel\s*=[\s\"]*(?<dm>DM-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "nodename\s*=[\s\"]*(?<node>ND-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "\|\s*pivot\s+(?<pv>.*?)\s"
|eval
idx=mvdedup(mvappend(idx, macro_index, dm, pv)),
idx=mvfilter(idx!="*" AND idx!="_*" AND NOT match(idx, "^_") AND NOT match(idx, "^\d+[\*_]")),
st=mvdedup(mvappend(st, node))
|where isnotnull(idx) AND isnotnull(st)
|stats c by idx, st
|table idx, st
One comment - I missed it when posting - in the mvfilter(), remove the last condition as that was specific to my use case when I made this - AND NOT match(idx, "^\d+[*_]"). We have indexes that start with a numeric ID for each customer and I wanted to ignore those.
hello JimSchlaker,
there are answers here around which indexes are used for reports / saved searches / dashboards and more that you can relay on. for example:
https://answers.splunk.com/answers/273176/how-can-i-determine-how-much-an-index-is-being-sea.html
https://answers.splunk.com/answers/186268/how-to-search-for-and-remove-indexes-in-splunk-tha.html
considering you mention also time span, meaning they might look at that particular index / source /sourcetype but not utilizing the old data, i would suggest an opposite way of approaching that challenge.
will suggest to either check the timerange on searches using | rest or the _audit index to determine. or verify with teams, how far back they need their data and set a hard time limit in indexes.conf on the index contains that data
hope it helps