Splunk Search

How to determine if logs are not being used?

New Member

Is there a way to determine which logs are not being used anymore, and therefore can be deleted? For example, maybe a team started logging something a year ago, but the team no longer uses that log for any reports/dashboards/etc... Is there a way to find these unused logs?

0 Karma

Esteemed Legend
0 Karma

Path Finder

In addition to the links mentioned by adonio, this search might get you some of the way there. However, things like macros can hide indexes/sourcetypes so it's not 100% but does also include data models/nodenames being used.

The search filters out all "*" and "_*" references as those aren't very useful. It prefixes data models with "DM-" and nodenames with "ND-" and treats those as an index/sourcetype combo. Macros are prefixed with "MC-" to easily identify and look at manually.

You could compare this against a REST call to the indexes or indexes-extended endpoint to get a starting point. BUT, you will want to confirm with data owners the indexes aren't actually being used since, again, this search is not 100%.

index=_internal sourcetype=splunkd_remote_searches
|dedup search

  search=replace(search, "(datamodel\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1DM-\2\3"),
  search=replace(search, "(eval\s+datamodel\s*=[\s\"]*)DM-", "\1"),
  search=replace(search, "(\|\s*pivot\s+)(.*?)(\s)", "\1DM-\2\3"),
  search=replace(search, "(nodename\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1ND-\2\3"),
  search=replace(search, "(eval\s+nodename\s*=[\s\"]*)ND-", "\1"),
  search=replace(search, "(search\s*`)(.*?)([`\(])", "\1MC-\2\3")

|rex field=search max_match=0 "index\s*=[\s\"]*(?<idx>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "sourcetype\s*=[\s\"]*(?<st>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "search\s*`(?<macro_index>MC-.*?)[`\(]"
|rex field=search max_match=0 "datamodel\s*=[\s\"]*(?<dm>DM-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "nodename\s*=[\s\"]*(?<node>ND-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "\|\s*pivot\s+(?<pv>.*?)\s"

  idx=mvdedup(mvappend(idx, macro_index, dm, pv)),
  idx=mvfilter(idx!="*" AND idx!="_*" AND NOT match(idx, "^_") AND NOT match(idx, "^\d+[\*_]")),
  st=mvdedup(mvappend(st, node))

|where isnotnull(idx) AND isnotnull(st)
|stats c by idx, st

|table idx, st

Path Finder

One comment - I missed it when posting - in the mvfilter(), remove the last condition as that was specific to my use case when I made this - AND NOT match(idx, "^\d+[*_]"). We have indexes that start with a numeric ID for each customer and I wanted to ignore those.

0 Karma

Ultra Champion

hello JimSchlaker,
there are answers here around which indexes are used for reports / saved searches / dashboards and more that you can relay on. for example:
considering you mention also time span, meaning they might look at that particular index / source /sourcetype but not utilizing the old data, i would suggest an opposite way of approaching that challenge.
will suggest to either check the timerange on searches using | rest or the _audit index to determine. or verify with teams, how far back they need their data and set a hard time limit in indexes.conf on the index contains that data
hope it helps

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Using the Splunk Threat Research Team’s Latest Security Content

REGISTER HERE Tech Talk | Security Edition Did you know the Splunk Threat Research Team regularly releases ...

SplunkTrust | 2024 SplunkTrust Application Period is Open!

It's that time again, folks! That's right, the application/nomination period for the 2024 SplunkTrust is ...