Getting Data In

recommended index sizes

Contributor

hi guys -

i have a stand-alone splunk server that i'm trying to size appropriately. we have a fixed 3TB volume to work with.

i am wondering how large or small to make the various indexes, especially the built-in ones: summary, _internal, etc.

it seems like the default sizes would theoretically allow for overrun on the volume (500,000 MB). so i guess my questions are:

1 - can / should we resize the internal indexes (i.e. _internal, history, _audit) to be more aware of the given storage volumes?
2 - what percentage should we reserve for summary indexing? 25% of desired index (and/or main)?

cheers,

andrew

0 Karma
1 Solution

Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

View solution in original post

0 Karma

Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

View solution in original post

0 Karma

Champion

In most cases the data will have rolled to Frozen and deleted before the Max DB size is approached. Make sure if you modify your indexes.conf buckets may rollover and cause data to be deleted.

0 Karma

Contributor

thank you. so i think it's fair to say that the sum of all your indexes should ideally not exceed the size of your available disk space / volume(s). it seems very unlikely for the internal indexes and so on to really use up much space, however your main / primary indexes should never exceed 100% of available space - perhaps even 90 or 95% is better.
i'm somewhat comparing this to when you partition new disk(s) during an initial OS install (i.e. swap, home, os, etc). the installation process in most cases won't let you allocate more than 100%.

0 Karma

Champion

In my env I have different types of storage for HOT(LOCAL SSD), WARM (TIER 2 SAN), COLD (TIER 3 SAN). In the end it comes down to knowing your data and configureing indexes based on retention/security/importance. Configuring Settings like maxHotSpanSecs(upper bound of timespan for Hotbuckets), maxHotIdleSecs(Maxlife of hotbucket). Hope this helps.

0 Karma

Champion

@awurster, "what happens when an indexer runs out of space on disk?" Your indexers will pause (stop indexing) which has a potentional for data loss. You can minimize possible data loss by using indexer acknowldgement, increasing input and output queueSize for streamed data sources. _internal or summary_indexes are just indexes and will have the same rules and will be paused. Once disk space issue has been resolve you indexer will continue indexing. An indexer pausing occurrs at 2000MB free diskspace by default. http://docs.splunk.com/Documentation/Splunk/5.0/Indexer/Setlimitsondiskusage

Contributor

thanks.

i guess in that case my question is more towards "what happens when an indexer runs out of space on disk?" and then "if something like main or another regular index fills up - what happens to retention of data in other key places like _internal or summary?"

just want to avoid any disasters once the disk fills up.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!