Splunk Enterprise

Is there a query to identify underused fields?

Kenny_splunk
Path Finder

Is there a query to identify underused fields? 
We are optimizing the size of our large indexes. we identified duplicates and noisy logs, but next we want to possibly find fields that arent commonly used and get rid of them. (or if you have any additional advise on cleaning out a large index)

is there a query for this?

Labels (1)
0 Karma

Kenny_splunk
Path Finder

understood, would you happen to have any advice on cleaning a big index?

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Kenny_splunk 

Really the only way to "clean" an index is for the data be aged-out. Running the "| delete" on an index will stop it appearing in searches however it will still be present on the disks, just with markers that stop it being returned, therefore it wont actually give you any space back if this is what you are looking for.

The best thing you can do is control the data arriving in the platform and reduce this as necessary, hopefully over time the older/larger/waste data will age out and free up space. 

What is your retention on this index(es)? If its something like 90 days then you wont have too long to wait, but if its 6 years then you might be stuck with that old data for some time!

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It should be stated up-front that indexes cannot be reduced in size.  You must wait for buckets to be frozen for data to go away.  The best you can do is reduce how much is stored in new buckets.

You've already taken a good first step by eliminating duplicate events.

Next, look at indexed fields.  Fields are best extracted at search-time rather than at index-time.  Doing so helps indexer performance, saves space in the indexes, and offers more flexibility with fields.

Look at the INDEXED_EXTRACTIONS settings in your props.conf files. Each of them will create index-time fields.  JSON data is especially verbose so KV_MODE=json should be used, instead.

---
If this reply helps you, Karma would be appreciated.

Kenny_splunk
Path Finder

yeah we make adjustments with new indexes, however, the large indexes were created before i got hired. so im actively trying to reduce ingest with whats already flowing. great advice btw.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

The best options is to define your use cases and based on those remove unused values before indexing events into disk. But this leads you a situation when you realize a new use case then you must update your indexing definitions to get a new values into splunk. 

One thing what you could look is to check that those events don’t contain same information twice or even more times. This can happen when you have some code on your data and then the same information has added as a clear text. A good example is Windows event logs where this happens.

There are also some other cases what you could do like

  • remove additional formatting like json objects contain additional spaces
  • remove unnecessary line breaks
  • check if you could utilize metrics indexes for some data instead of putting everything in event indexes 

livehybrid
SplunkTrust
SplunkTrust

Hi @Kenny_splunk 

Unfortunately this is not something which is possible. 

I have seen some attempts at this previously, however it is very easy to miss things, as specific fields are not always referenced but could be used, such as the following examples:

  • A _raw event could be presented in a dashboard in a scenario - viewer may use this to determine something.
  • A raw event may be emailed as an alert to a user to take action on something based on something inside the event.
  • Use of wildcards such as | table my_* or stats values(*) as *

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...