Re: Is there a query to identify underused fields?

Kenny_splunk · ‎04-15-2025

Is there a query to identify underused fields?
We are optimizing the size of our large indexes. we identified duplicates and noisy logs, but next we want to possibly find fields that arent commonly used and get rid of them. (or if you have any additional advise on cleaning out a large index)

is there a query for this?

Kenny_splunk · ‎04-16-2025

understood, would you happen to have any advice on cleaning a big index?

livehybrid · ‎04-16-2025

Hi @Kenny_splunk

Really the only way to "clean" an index is for the data be aged-out. Running the "| delete" on an index will stop it appearing in searches however it will still be present on the disks, just with markers that stop it being returned, therefore it wont actually give you any space back if this is what you are looking for.

The best thing you can do is control the data arriving in the platform and reduce this as necessary, hopefully over time the older/larger/waste data will age out and free up space.

What is your retention on this index(es)? If its something like 90 days then you wont have too long to wait, but if its 6 years then you might be stuck with that old data for some time!

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

richgalloway · ‎04-16-2025

It should be stated up-front that indexes cannot be reduced in size. You must wait for buckets to be frozen for data to go away. The best you can do is reduce how much is stored in new buckets.

You've already taken a good first step by eliminating duplicate events.

Next, look at indexed fields. Fields are best extracted at search-time rather than at index-time. Doing so helps indexer performance, saves space in the indexes, and offers more flexibility with fields.

Look at the INDEXED_EXTRACTIONS settings in your props.conf files. Each of them will create index-time fields. JSON data is especially verbose so KV_MODE=json should be used, instead.

---
If this reply helps you, Karma would be appreciated.

Kenny_splunk · ‎04-16-2025

yeah we make adjustments with new indexes, however, the large indexes were created before i got hired. so im actively trying to reduce ingest with whats already flowing. great advice btw.

isoutamo · ‎04-16-2025

The best options is to define your use cases and based on those remove unused values before indexing events into disk. But this leads you a situation when you realize a new use case then you must update your indexing definitions to get a new values into splunk.

One thing what you could look is to check that those events don’t contain same information twice or even more times. This can happen when you have some code on your data and then the same information has added as a clear text. A good example is Windows event logs where this happens.

There are also some other cases what you could do like

remove additional formatting like json objects contain additional spaces
remove unnecessary line breaks
check if you could utilize metrics indexes for some data instead of putting everything in event indexes

livehybrid · ‎04-15-2025

Hi @Kenny_splunk

Unfortunately this is not something which is possible.

I have seen some attempts at this previously, however it is very easy to miss things, as specific fields are not always referenced but could be used, such as the following examples:

A _raw event could be presented in a dashboard in a scenario - viewer may use this to determine something.
A raw event may be emailed as an alert to a user to take action on something based on something inside the event.
Use of wildcards such as | table my_* or stats values(*) as *

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Is there a query to identify underused fields?

using Splunk Enterprise

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

Is there a query to identify underused fields?

using Splunk Enterprise

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?