Getting Data In

Per Index Configuration

edwardrose
Contributor

Hello All,

I am trying to clean up our indexes and their sizes to ensure that we are keeping the correct amount of data for each index.  I have about 5 to 10 really busy indexes that bring in most of the data.

 

pan_logs~200GB/day
syslog

~10GB/day

checkpoing (coming soon)~250GB/day 
wineventlog~650GB/day
network~180GB/day

So question is if when I create an index configuration for example wineventlog

 

 

[wineventlog]
homePath = volume:hot/wineventlog/db
homePath.maxDataSizeMB = 19500000
coldPath = volume:cold/wineventlog/colddb
coldPath.maxDataSizeMB = 58500000
thawedPath = /splunk/cold/wineventlog/thaweddb
maxHotBuckets = 10
maxDataSize = auto_high_volume
maxTotalDataSizeMB = 78000000
disabled = 0
repFactor=auto

 

 

So 30 days of hot/warm would be 1.95TB and 90days of cold data would be 5.85TB and the total size would be 78TB data.  The sizes would then be divided by the total number of indexers we have (20) and each indexer should host about 975GB of hot/warm and 2.925TB of cold data.  And Splunk would start to roll data to frozen (dev null) when the max total (Hot/Warm + Cold) data reached 78TB.  Is that correct? Do I need to specify maxTotalDataSizeMB if I am using homePath and coldPath settings?

 

Thanks

ed

Labels (2)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

1st you should add also maxVolumeDataSizeMB to _splunk_summaries volume.

volume:hot said that it's max size is 3.2TB per indexer. These values are always per individual indexer not for total cluster. Total size depends how many indexers you have in your cluster.

In indexer cluster the total size of used storage for index depends what are you SF + RF (search and replication factor) and have you an single or multisite cluster. But as I said all those configurations in CM's indexes.conf are valid for each individual host on cluster not total for cluster. So basically each node could have that 1.95TB on it's coldPath or one can have e.g. 1TB and second 1.5TB and another that 1.95TB. That depends how well your data have distributed over indexers in cluster and how those buckets are replicated etc.

I hope that this explains it.

r. Ismo

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

This is how I have done this.

You are already using volumes which is an excellent (IMHO mandatory) thing. You should define total size of volumes 78TB (maxVolumeDataSizeMB) - some overflow space as time by time there will be more data on indexes than you have defined those size before splunk start to migrate / frozen those buckets. Actual size for overflow depends have you cluster on single node and how many hot buckets you have configured etc. Also there are some bugs which cause need for more free space than earlier to avoid full disk space.

Then define index max size  (maxTotalDataSizeMB) which define total hot/warm + cold size (default is 500GB). After that fine tuning this with hot/warm + cold sizes.

And reality is that the size of your index depends which limits hits first (volume size, index size, hot/warm vs cold size or amount of buckets).

r. Ismo

0 Karma

edwardrose
Contributor

I did forget to add the following:

 

[default]
frozenTimePeriodInSecs = 10368000
homePath.maxDataSizeMB = 3000000
coldPath.maxDataSizeMB = 10598400

[volume:_splunk_summaries]
path = /splunk/cold/splunk_summaries

[volume:hot]
path = /splunk/hot
maxVolumeDataSizeMB = 3400000

[volume:cold]
path = /splunk/cold
maxVolumeDataSizeMB = 10957620

 

This is where I get confused.  We have a total amount of storage of  68TB of hot storage divided among the 20 indexers.  So each indexer has a 3.4TB volume.  And we have 220TB of cold storage with each indexer having a 11TB.  I gave the default value for the homePath 3TB with 400GB of extra room and I gave coldPath 10.5TB with 500GB of extra room.  

But if from my example 90 days of hot/warm data for wineventlog is 1.95TB does Splunk divide that automatically between all 20 indexers and the applies homePath of 1.95TB to the total amount of data across all 20 indexers?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

1st you should add also maxVolumeDataSizeMB to _splunk_summaries volume.

volume:hot said that it's max size is 3.2TB per indexer. These values are always per individual indexer not for total cluster. Total size depends how many indexers you have in your cluster.

In indexer cluster the total size of used storage for index depends what are you SF + RF (search and replication factor) and have you an single or multisite cluster. But as I said all those configurations in CM's indexes.conf are valid for each individual host on cluster not total for cluster. So basically each node could have that 1.95TB on it's coldPath or one can have e.g. 1TB and second 1.5TB and another that 1.95TB. That depends how well your data have distributed over indexers in cluster and how those buckets are replicated etc.

I hope that this explains it.

r. Ismo

Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...