Getting Data In

Why my current index size is larger than the max index size?

tchaeunsang1
New Member

Hello guys, I want to discover things about indexes, so I created and index and I gave ita maximum size of 20MB, my collect is in real time but what I see is that the current size exceeds the max size, when I restart splunk I get 1MB as current size, fromw aht I know is that the data transformed from hot data to warm data but why isn't it the same when the cureent size exceeds the maximum size?
Thanks.

0 Karma
1 Solution

spavin
Path Finder

Hi @tchaeunsang1,

The index you created may have looked something like this:

[testindexsize]
homePath = $SPLUNK_DB\$_index_name\db
coldPath = $SPLUNK_DB\$_index_name\colddb
thawedPath = $SPLUNK_DB\testindexsize\thaweddb
homePath.maxDataSizeMB = 0 
coldPath.maxDataSizeMB = 0
maxTotalDataSizeMB = 20

Which makes an index with a maximum size of 20MB, with no other limits on how big the hot/warm or cold dbs can get.

When you add data it will go into a hot bucket. The bucket size defaults to "auto" which is 750MB. That means your data will grow into a 750MB bucket before rolling to warm (assuming you have data constantly coming in).

Once it rolls to warm, that's when the maxTotalDataSizeMB kicks in. It sees that the hot/warm db is taking up too much space, and so it starts to roll buckets to get back down to 20MB. That means the warm bucket is removed, and you're back to under 20MB.

The rolling process happens when:

  • The DB size gets higher than maxTotalDataSizeMB (it takes into account hot buckets, but will only roll from warm to cold and freeze cold, so your hot bucket is safe)
  • The most recent timestamp in the bucket is older than the frozenTimePeriodInSecs
  • splunkd is restarted - this will roll all hot buckets into warm buckets
  • You manually run: splunk _internal call /data/indexes/testindexsize/roll-hot-buckets

Try repeating your experiement with maxDataSize=1

This represents the maximum size in MB for a hot DB to reach before a roll to warm is triggered.

You should then see your index keeping much closer to the 20MB limit.

View solution in original post

0 Karma

tchaeunsang1
New Member

Thanks a lot for your answer ! Actually I changed the bucket size to 10MB so everytime the current size reach the maxData size, hot data rolls to warm data and current size decrease by 10MB (bucket size).

0 Karma

spavin
Path Finder

Hi @tchaeunsang1,

The index you created may have looked something like this:

[testindexsize]
homePath = $SPLUNK_DB\$_index_name\db
coldPath = $SPLUNK_DB\$_index_name\colddb
thawedPath = $SPLUNK_DB\testindexsize\thaweddb
homePath.maxDataSizeMB = 0 
coldPath.maxDataSizeMB = 0
maxTotalDataSizeMB = 20

Which makes an index with a maximum size of 20MB, with no other limits on how big the hot/warm or cold dbs can get.

When you add data it will go into a hot bucket. The bucket size defaults to "auto" which is 750MB. That means your data will grow into a 750MB bucket before rolling to warm (assuming you have data constantly coming in).

Once it rolls to warm, that's when the maxTotalDataSizeMB kicks in. It sees that the hot/warm db is taking up too much space, and so it starts to roll buckets to get back down to 20MB. That means the warm bucket is removed, and you're back to under 20MB.

The rolling process happens when:

  • The DB size gets higher than maxTotalDataSizeMB (it takes into account hot buckets, but will only roll from warm to cold and freeze cold, so your hot bucket is safe)
  • The most recent timestamp in the bucket is older than the frozenTimePeriodInSecs
  • splunkd is restarted - this will roll all hot buckets into warm buckets
  • You manually run: splunk _internal call /data/indexes/testindexsize/roll-hot-buckets

Try repeating your experiement with maxDataSize=1

This represents the maximum size in MB for a hot DB to reach before a roll to warm is triggered.

You should then see your index keeping much closer to the 20MB limit.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...