Splunk Search

How to determine if logs are not being used?

JimSchlaker
New Member

Is there a way to determine which logs are not being used anymore, and therefore can be deleted? For example, maybe a team started logging something a year ago, but the team no longer uses that log for any reports/dashboards/etc... Is there a way to find these unused logs?

0 Karma

woodcock
Esteemed Legend
0 Karma

kellewic
Path Finder

In addition to the links mentioned by adonio, this search might get you some of the way there. However, things like macros can hide indexes/sourcetypes so it's not 100% but does also include data models/nodenames being used.

The search filters out all "*" and "_*" references as those aren't very useful. It prefixes data models with "DM-" and nodenames with "ND-" and treats those as an index/sourcetype combo. Macros are prefixed with "MC-" to easily identify and look at manually.

You could compare this against a REST call to the indexes or indexes-extended endpoint to get a starting point. BUT, you will want to confirm with data owners the indexes aren't actually being used since, again, this search is not 100%.

index=_internal sourcetype=splunkd_remote_searches
|dedup search

|eval
  search=replace(search, "(datamodel\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1DM-\2\3"),
  search=replace(search, "(eval\s+datamodel\s*=[\s\"]*)DM-", "\1"),
  search=replace(search, "(\|\s*pivot\s+)(.*?)(\s)", "\1DM-\2\3"),
  search=replace(search, "(nodename\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1ND-\2\3"),
  search=replace(search, "(eval\s+nodename\s*=[\s\"]*)ND-", "\1"),
  search=replace(search, "(search\s*`)(.*?)([`\(])", "\1MC-\2\3")

|rex field=search max_match=0 "index\s*=[\s\"]*(?<idx>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "sourcetype\s*=[\s\"]*(?<st>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "search\s*`(?<macro_index>MC-.*?)[`\(]"
|rex field=search max_match=0 "datamodel\s*=[\s\"]*(?<dm>DM-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "nodename\s*=[\s\"]*(?<node>ND-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "\|\s*pivot\s+(?<pv>.*?)\s"

|eval
  idx=mvdedup(mvappend(idx, macro_index, dm, pv)),
  idx=mvfilter(idx!="*" AND idx!="_*" AND NOT match(idx, "^_") AND NOT match(idx, "^\d+[\*_]")),
  st=mvdedup(mvappend(st, node))

|where isnotnull(idx) AND isnotnull(st)
|stats c by idx, st

|table idx, st

kellewic
Path Finder

One comment - I missed it when posting - in the mvfilter(), remove the last condition as that was specific to my use case when I made this - AND NOT match(idx, "^\d+[*_]"). We have indexes that start with a numeric ID for each customer and I wanted to ignore those.

0 Karma

adonio
Ultra Champion

hello JimSchlaker,
there are answers here around which indexes are used for reports / saved searches / dashboards and more that you can relay on. for example:
https://answers.splunk.com/answers/273176/how-can-i-determine-how-much-an-index-is-being-sea.html
https://answers.splunk.com/answers/186268/how-to-search-for-and-remove-indexes-in-splunk-tha.html
considering you mention also time span, meaning they might look at that particular index / source /sourcetype but not utilizing the old data, i would suggest an opposite way of approaching that challenge.
will suggest to either check the timerange on searches using | rest or the _audit index to determine. or verify with teams, how far back they need their data and set a hard time limit in indexes.conf on the index contains that data
hope it helps

Get Updates on the Splunk Community!

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Using the Splunk Threat Research Team’s Latest Security Content

REGISTER HERE Tech Talk | Security Edition Did you know the Splunk Threat Research Team regularly releases ...