Splunk Dev

How to get a list of the oldest or newest $n$ events to delete and save disk space?

trenin
Explorer

Perhaps I am going about this the wrong way, so I am open to suggestions on how I can do this.

Basically, I've got a set of files on disk and some metadata about those files ingested by Splunk. As the number of these files grows, I would like to delete the oldest of these to keep the disk usage low. Ideally, I would like to query Splunk for the oldest events so that I can delete them on disk and then delete them from Splunk. Searching for the most recent events seems quick, but searching for the oldest seems very time-consuming. Is there a better way to do this?

Here are some ways I've thought of so far:

search index=myindex | reverse | head 100
search index=myindex | tail 100

Both are very slow while the following (the opposite of what I want) is fast

search index=myindex | head 100

Alternatively, I could look for the oldest 1%, but I doubt that will make anything easier. I want to avoid re-ingesting the metadata into something like SQL where I can make quick time-based queries, but I might have to do so if I can't find a way to use Splunk as the metadata master.

Thanks!

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Splunk processes events in reverse chronological order. The means it will always be faster to find the most recent events, but will take longer to find the oldest events.

This is not the way to manage storage on Splunk, however. First, you can't delete events. The delete command only hides events. The only way events get deleted is when they age out or the index has reached its maximum size. You should configure the index that stores your events so disk space is kept at the desired level.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Splunk processes events in reverse chronological order. The means it will always be faster to find the most recent events, but will take longer to find the oldest events.

This is not the way to manage storage on Splunk, however. First, you can't delete events. The delete command only hides events. The only way events get deleted is when they age out or the index has reached its maximum size. You should configure the index that stores your events so disk space is kept at the desired level.

---
If this reply helps you, Karma would be appreciated.

trenin
Explorer

Thanks for the reply. I am not interested in the disk space used by splunk. The metadata ingested by Splunk is minimal - the actual files on disk are orders of magnitude larger than the metadata. So, when the disk approaches full, I want to know which are the oldest and then delete them. I was hoping to use Splunk to tell me which are the oldest since I already have the metadata ingested and the event time is included in the metadata. However, it looks like I will have to stand up a mysql DB to ingest the timestamps of the files so that I can get a timely response for which are the oldest. Thanks!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...