Getting Data In

How to reduce the size of specific indexes?

Builder

Greetings,

My indexers have run out of space and I have been reducing the maxHotSpanSecs, but it keeps filling up. I seem to have data being indexed or reindexed not according to my inputs.conf wishes. I'm going to figure that out after this.

While looking at what is taking up space, I have noticed that indexes in /opt/splunk/var/lib/splunk/indexnamedb also seem to be out of whack with my wishes.

What is the best way to reduce certain indexes (less important ones) footprint on disk? Do I create different volumes for them that are smaller? Do I reduce the size the index can grow to? Can I make them timeout at a different rate?

Thanks!

0 Karma
1 Solution

Splunk Employee
Splunk Employee

maxHotSpanSecs determines how long a hot bucket is open before it rolls over to warm.
However, both hot and warm buckets consume disk space. This is the data roll progression:

Hot(lives at the specified hot/warm mount point, is able to be written to)------/after maxHotSpanSecs or other config/-->
Warm(lives at the specified hot/warm mount point, is read only)----/after maxWarmDBCount/--->
Cold(lives at the specified cold path which could be on the same disk as hot/warm or a different one, is read only)-/(maxTotalDataSizeMB)->
Frozen (deleted by default but could be archived somewhere or sent into Hadoop)

It sounds like you want to do is hard footprint taken up on disk so I would suggest setting maxTotalDataSizeMB to the maximum amount of space you want that index to take up on disk.

View solution in original post

0 Karma

New Member

I'm running into the same situation here; we have 300GB for our Web index - of which keeps filling up. Any suggestions on how to purge this space to only 30 days and have it clear up some space?

0 Karma

Moderator
Moderator

Hello shawno

This question was posted almost 2 years ago. If the accepted answer is not able to help answer your question, please post a new question to get maximum exposure and help.

Thanks

0 Karma

Ultra Champion

There is a delicate balance between frozenTimePeriodInSecs and maxTotalDataSizeMB.

We have for example -

[_internal]
repFactor = auto
frozenTimePeriodInSecs = 34186698
maxTotalDataSizeMB = 2500000

For each index, one has to be comfortable with these two config parameters.

We ran recently into a disturbing situation where frozenTimePeriodInSecs was increased to 4 months and the default of maxTotalDataSizeMB which is 500GB wasn't large enough for four months of data. The customer obviously flipped out ; -)

0 Karma

Splunk Employee
Splunk Employee

maxHotSpanSecs determines how long a hot bucket is open before it rolls over to warm.
However, both hot and warm buckets consume disk space. This is the data roll progression:

Hot(lives at the specified hot/warm mount point, is able to be written to)------/after maxHotSpanSecs or other config/-->
Warm(lives at the specified hot/warm mount point, is read only)----/after maxWarmDBCount/--->
Cold(lives at the specified cold path which could be on the same disk as hot/warm or a different one, is read only)-/(maxTotalDataSizeMB)->
Frozen (deleted by default but could be archived somewhere or sent into Hadoop)

It sounds like you want to do is hard footprint taken up on disk so I would suggest setting maxTotalDataSizeMB to the maximum amount of space you want that index to take up on disk.

View solution in original post

0 Karma

Builder

I actually brought down the maxWarmDBCount but it is having minimal effect on my storage. My cold storage is on a multiple Petabyte array - so I'm not worried about it (yet). I am really at a loss for what is going on. I started to very closely monitor all indexes, sourcetypes, sources, and hosts to see if anything is out of place or producing more data than I would think.

I think Splunk was reindexing data from the Heavy Forwarder as the syslog rolled files into host..gz from the typical host.log. Splunk is throwing that .gz data into Main index because it really doesn't know what to do with it. SIGH How about forgetaboutit!

What I really need is tighter integration from Splunk to my rsyslog so that the log files disappear as they are consumed by Splunk.

I have WAY ratcheted down my hot and warm storage hoping that as the .gz data passes through and times out, I will regain my disks. Now my servers are spinning away with load averages in the 10s...

0 Karma

Builder

One more comment, I had to make the above changes to each indexer directly. I was first making the change to the deployment server forgetting that this is not a deployment function.

0 Karma