Developing for Splunk Enterprise
Highlighted

How to get a list of the oldest or newest $n$ events to delete and save disk space?

Explorer

Perhaps I am going about this the wrong way, so I am open to suggestions on how I can do this.

Basically, I've got a set of files on disk and some metadata about those files ingested by Splunk. As the number of these files grows, I would like to delete the oldest of these to keep the disk usage low. Ideally, I would like to query Splunk for the oldest events so that I can delete them on disk and then delete them from Splunk. Searching for the most recent events seems quick, but searching for the oldest seems very time-consuming. Is there a better way to do this?

Here are some ways I've thought of so far:

search index=myindex | reverse | head 100
search index=myindex | tail 100

Both are very slow while the following (the opposite of what I want) is fast

search index=myindex | head 100

Alternatively, I could look for the oldest 1%, but I doubt that will make anything easier. I want to avoid re-ingesting the metadata into something like SQL where I can make quick time-based queries, but I might have to do so if I can't find a way to use Splunk as the metadata master.

Thanks!

Labels (1)
0 Karma
Highlighted

Re: How to get a list of the oldest $n$ events

SplunkTrust
SplunkTrust

Splunk processes events in reverse chronological order. The means it will always be faster to find the most recent events, but will take longer to find the oldest events.

This is not the way to manage storage on Splunk, however. First, you can't delete events. The delete command only hides events. The only way events get deleted is when they age out or the index has reached its maximum size. You should configure the index that stores your events so disk space is kept at the desired level.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Highlighted

Re: How to get a list of the oldest $n$ events

Explorer

Thanks for the reply. I am not interested in the disk space used by splunk. The metadata ingested by Splunk is minimal - the actual files on disk are orders of magnitude larger than the metadata. So, when the disk approaches full, I want to know which are the oldest and then delete them. I was hoping to use Splunk to tell me which are the oldest since I already have the metadata ingested and the event time is included in the metadata. However, it looks like I will have to stand up a mysql DB to ingest the timestamps of the files so that I can get a timely response for which are the oldest. Thanks!

0 Karma