Deployment Architecture

My index size limit has been reached. Why is oldest indexed data not being deleted "in order"?

FanaticWorks
Explorer

I have an index with a size limit of 80GB and based on the data we index this should be about 7-10 days of retention.

However I have been indexing into that index for 3 months and it's full and old data being deleted as expected.

But it's messy! Rather than having a clean 7-10 days of data with a clear line where the old data is being deleted I have some entries span the whole 3 months.

1-10 days old = 50m entries per day
11-25 days old = 100k entries per day
26-70 days old = 10k entries per day
70-90 days old = 0-100 entries per day

No other configurations change. Data comes in in chronological order.

Anyway to make this clearer so that analyst don't have this incomplete data span a huge time range. Just a complete data set for 7-10 days.

I'm sure this is an obvious one but couldn't find anything so far in manuals/forum.

Cheers...

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Old data is removed by-the-bucket, with the bucket with the oldest most recent event being removed first.
As a result, you will always have a fuzzy edge where data from, say, ten days ago may have been in two buckets with one bucket gone and the other still there.

To reduce the fuzziness you can take these steps in the indexes.conf for that index:

  • set a retention time rather than deleting by space
  • reduce the maximum span per bucket

Most important though: Teach your users to filter by time range. That's the only way to not have fuzzy edges.

FanaticWorks
Explorer

Thanks for your answer. Which value are you suggesting changing in indexes.conf?

Something like maxHotSpanSecs=3600 so each bucket is only 1 hour?

0 Karma

ddrillic
Ultra Champion

maxHotSpanSecs is a tricky one, because if you set it to an hour or a day, you might end up with a proliferation of buckets. So, really we need to correlate the value of maxHotSpanSecs with the volume and speed of data flowing in per index. Btw, the best practice is to have up to 50K buckets per index.

0 Karma

FanaticWorks
Explorer

Thanks for help so far... I have been using dbinspect command to try to better understand the process and I wondered if what I am seeing is normal.

Whilst some buckets seem to follow the exact pattern described... i.e. they get created as hot, get filled with 1-2 hours of data, move to warm and make their way through to be deleted. I also have multiple buckets (old and new) that span 50+ days. Including ones that are hot. Is this normal as I think these are the problem ones...

bucketId startEpoch endEpoch timespanDays eventCount hostCount modTime sizeOnDiskMB state

sourceX~1353~X-X-X-X-X 20/10/2016 15:36 11/12/2016 16:33 52.081192 5686665 1 12/12/2016:09:08:28 894.238281 warm
sourceX~1354~X-X-X-X-X 09/12/2016 23:50 10/12/2016 08:48 0.373391 5966823 1 12/10/2016:08:48:38 870.082031 warm
sourceX~1355~X-X-X-X-X 10/12/2016 08:50 10/12/2016 16:26 0.316701 5878314 1 12/10/2016:16:27:20 853.476563 warm
sourceX~1356~X-X-X-X-X 10/12/2016 16:27 11/12/2016 00:14 0.323762 5955055 1 12/11/2016:00:14:26 876.789063 warm
sourceX~1357~X-X-X-X-X 10/12/2016 18:25 11/12/2016 09:41 0.636377 5900307 1 12/11/2016:09:42:00 858.257813 warm
… … … … … … … … …
sourceX~1444~X-X-X-X-X 20/12/2016 07:05 20/12/2016 10:53 0.158715 5408170 1 12/20/2016:10:54:13 885.84375 warm
sourceX~1445~X-X-X-X-X 20/09/2016 19:22 20/12/2016 12:24 90.751366 3594043 1 12/20/2016:14:38:59 500.539063 hot
sourceX~1447~X-X-X-X-X 20/12/2016 11:20 20/12/2016 12:21 0.04206 5511911 1 12/20/2016:12:21:21 897.332031 warm
sourceX~1448~X-X-X-X-X 20/12/2016 12:22 20/12/2016 13:11 0.034294 4385224 1 12/20/2016:13:12:04 593.207031 warm
sourceX~1449~X-X-X-X-X 20/12/2016 12:24 20/12/2016 13:41 0.053623 2996650 1 12/20/2016:14:38:59 418.238281 hot
sourceX~1450~X-X-X-X-X 20/12/2016 13:41 20/12/2016 14:38 0.039861 4860534 1 12/20/2016:14:39:08 675.105469 hot

So in this scenario

sourceX~1354~X-X-X-X-X will be deleted next as it has the oldest endEpoch - and it was

sourceX~1353~X-X-X-X-X and sourceX~1445~X-X-X-X-X span 50-90 days. Why is this? Shouldn't all buckets be similar.

Q: Is this normal? Should I just set a timelimit in indexes.conf? Something like maxHotSpanSecs=3600 so each bucket is only 1 hour?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...