Getting Data In

recommended index sizes

awurster
Contributor

hi guys -

i have a stand-alone splunk server that i'm trying to size appropriately. we have a fixed 3TB volume to work with.

i am wondering how large or small to make the various indexes, especially the built-in ones: summary, _internal, etc.

it seems like the default sizes would theoretically allow for overrun on the volume (500,000 MB). so i guess my questions are:

1 - can / should we resize the internal indexes (i.e. _internal, history, _audit) to be more aware of the given storage volumes?
2 - what percentage should we reserve for summary indexing? 25% of desired index (and/or main)?

cheers,

andrew

0 Karma
1 Solution

bmacias84
Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

View solution in original post

0 Karma

bmacias84
Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

0 Karma

bmacias84
Champion

In most cases the data will have rolled to Frozen and deleted before the Max DB size is approached. Make sure if you modify your indexes.conf buckets may rollover and cause data to be deleted.

0 Karma

awurster
Contributor

thank you. so i think it's fair to say that the sum of all your indexes should ideally not exceed the size of your available disk space / volume(s). it seems very unlikely for the internal indexes and so on to really use up much space, however your main / primary indexes should never exceed 100% of available space - perhaps even 90 or 95% is better.
i'm somewhat comparing this to when you partition new disk(s) during an initial OS install (i.e. swap, home, os, etc). the installation process in most cases won't let you allocate more than 100%.

0 Karma

bmacias84
Champion

In my env I have different types of storage for HOT(LOCAL SSD), WARM (TIER 2 SAN), COLD (TIER 3 SAN). In the end it comes down to knowing your data and configureing indexes based on retention/security/importance. Configuring Settings like maxHotSpanSecs(upper bound of timespan for Hotbuckets), maxHotIdleSecs(Maxlife of hotbucket). Hope this helps.

0 Karma

bmacias84
Champion

@awurster, "what happens when an indexer runs out of space on disk?" Your indexers will pause (stop indexing) which has a potentional for data loss. You can minimize possible data loss by using indexer acknowldgement, increasing input and output queueSize for streamed data sources. _internal or summary_indexes are just indexes and will have the same rules and will be paused. Once disk space issue has been resolve you indexer will continue indexing. An indexer pausing occurrs at 2000MB free diskspace by default. http://docs.splunk.com/Documentation/Splunk/5.0/Indexer/Setlimitsondiskusage

awurster
Contributor

thanks.

i guess in that case my question is more towards "what happens when an indexer runs out of space on disk?" and then "if something like main or another regular index fills up - what happens to retention of data in other key places like _internal or summary?"

just want to avoid any disasters once the disk fills up.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...