Dear everyone,
I have a Splunk Clustering (2 indexers) with:
Replication Factor=2
Searchable Factor=2
I supposed to sizing a index A on indexes.conf. Then, I found this useful website: https://splunk-sizing.soclib.net/
My concern on this website is how to calculate "Daily Data Volume" (average uncompressed raw data).
So, how can I calculate this ? Can I use a SPL command on Search Head to calculate this ?
Thanks & best regards.
Hi @thanh_on
The "Daily Data Volume" in this case is the amount of daily ingest.
You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing
Or by running the following search:
index=_internal
[ rest splunk_server=local /services/server/info
| return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d
| eval _time=_time - 43200
| bin _time span=1d
| stats latest(b) AS b by slave, pool, _time
| timechart span=1d sum(b) AS "volume" fixedrange=false
| fields - _timediff
| foreach "*"
[ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hi @livehybrid ,
Thank you for your anwser,
FYI, I have separated the index for each device or vendor because each device has different data retention policies.
Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf
For example:
[idx_fgt]
(180 days searchable)
[idx_windows]
(365 days searchable)
Can I use *license_usage.log* by each index for this situation ?
Thanks & best regards.
@thanh_on
Yes, you can use license_usage.log to calculate daily data volume per index.
simple query to check by index.
index=_internal source=*license_usage.log type="Usage" idx=* | timechart span=1d sum(b) as bytes by idx | eval GB=round(bytes/1024/1024/1024, 2)
Also remember that you should not have your environment sized "tightly" - RF and SF should be smaller than your number of indexers. Otherwise your cluster will not be able to rebalance data in case of indexer failure.
Hi @thanh_on ,
you can know this viewing the license consuption for each day, that's the total indexeing volume of all the day in all the Indexers of the Cluster.
Ciao.
Giuseppe
Hi @gcusello ,
Thank you for your anwser,
FYI, I have separated the index for each device or vendor because each device has different data retention policies.
Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf
For example:
[idx_fgt]
(180 days searchable)
[idx_windows]
(365 days searchable)
Do you have any suggestion ?
Thanks & best regards.
Hi @thanh_on ,
different data retentions is one of the most reasons to have different indexes.
In this case, in indexes.conf, you have to define for each index you created the frozenTimePeriodInSecs option in indexes.conf.
In your cases, 180 days is 15552000 seconds and 365 days is 31536000 seconds:
[idx_fgt]
<other settings>
frozenTimePeriodInSecs = 15552000
[idx_windows]
<other settings>
frozenTimePeriodInSecs = 31536000
Passed this period, data can be deleted or moved offline (copied in a different location).
Remember that the retention policies are applied at index level on buckets, in other words, you could have data that exceed data retention because they are in a bucket where there is at least one event with timestamp inside the retention period.
When the earliest event exceed the retention period the bucket is deleted or moved offline.
Ciao.
Giuseppe
Dear @gcusello
Thanks you for your advice,
I think frozenTimePeriodInSecs is not enough,
We need to define homePatch.maxDatasizeMB, coldPath.maxDataSizeMB, maxTotalsizeMB. Then summary all Index capacity to define disk capacity for our retention polices.
For example below:
[idx_fgt]
<other settings>
homePath.maxDataSizeMB = 101200 # ~100GB
coldPath.maxDataSizeMB = 256000 # 250GB
maxTotalDataSizeMB = 357200
frozenTimePeriodInSecs = 15552000
[idx_windows]
<other settings>
homePath.maxDataSizeMB = 201200 # ~200GB
coldPath.maxDataSizeMB = 356000 # ~350GB
maxTotalDataSizeMB = 557200
frozenTimePeriodInSecs = 31536000
Summary [idx_fgt] and [idx_windows] we got for each indexer instance:
~300GB capacity for volume ../hot_warm/
~600GB capacity for volume ../cold/
Our final goal is calculate addtitional capacity for disks on indexer instance. That's why in the title, we need to calculate Daily Data Volume 😄
More any suggestion from you ?
Thanks & best regards.
Hi @thanh_on ,
yes, obviously: I hinted only for the retention period.
Only one hint:
don't define the capacity of each index, but create a volume that will contain all your indexes and define the max volume dimension.
In this way, you can dynamicall manage indexes dimensions.
For volume creation and configuration see at https://docs.splunk.com/Documentation/Splunk/9.4.2/Admin/Indexesconf#indexes.conf.spec
This is an example:
### This example demonstrates the use of volumes ###
# volume definitions; prefixed with "volume:"
[volume:hot1]
path = /mnt/fast_disk
maxVolumeDataSizeMB = 100000
[volume:cold1]
path = /mnt/big_disk
# maxVolumeDataSizeMB not specified: no data size limitation on top of the
# existing ones
[volume:cold2]
path = /mnt/big_disk2
maxVolumeDataSizeMB = 1000000
# index definitions
[idx1]
homePath = volume:hot1/idx1
coldPath = volume:cold1/idx1
# thawedPath must be specified, and cannot use volume: syntax
# choose a location convenient for reconstitition from archive goals
# For many sites, this may never be used.
thawedPath = $SPLUNK_DB/idx1/thaweddb
[idx2]
# note that the specific indexes must take care to avoid collisions
homePath = volume:hot1/idx2
coldPath = volume:cold2/idx2
thawedPath = $SPLUNK_DB/idx2/thaweddb
[idx3]
homePath = volume:hot1/idx3
coldPath = volume:cold2/idx3
thawedPath = $SPLUNK_DB/idx3/thaweddb
[idx4]
datatype = metric
homePath = volume:hot1/idx4
coldPath = volume:cold2/idx4
thawedPath = $SPLUNK_DB/idx4/thaweddb
metric.maxHotBuckets = 6
metric.splitByIndexKeys = metric_name
Ciao.
Giuseppe
Dear @gcusello ,
Thank you for your advice,
As your recommnedation, I just do these steps (please fix me if I'm wrong):
1) Using Splunk Clustering Daily License Usage as a Daily Data Volume (Example: 10GB per day)
2) Using Splunk Sizing site to calculate with retention policy requirement
3) Configuring indexes.conf as your recommendation for each volume
[volume:hotwarm]
path = /mnt/hotwarm_disk
maxVolumeDataSizeMB = 102400 #100G
[volume:cold]
path = /mnt/cold_disk
maxVolumeDataSizeMB = 204,800 #200G
#Frozen Disk: /mnt/frozen_disk is 410G
[idx]
homePath = volume:hotwarm/defaultdb/db
coldPath = volume:cold/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
frozenTimePeriodInSecs = 7776000 #90 days searchable
coldToFrozenDir = /mnt/frozen_disk/defaultdb/frozendb
Thanks & Best regards.
Wait a second. Something doesn't add up here. Even ignoring the syntax of that 200MB cold volume limit, if you set hot/warm to 100GB, cold to 200GB you'll get at most 300GB of space. In ideal conditions that's 30*10GB (in reality you need some buffer for acceleration summaries and pushing a filesystem to 100% usage is not a healthy practice anyway) but for your one index for which you've shown the config you have 90 days retention policy. Ok, you wrote that you have multiple indexes with different retention requirements but remember to take them all into account.
Hi @thanh_on ,
let us know if we can help you more, or, please, accept one answer for the other people of Community.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
Hi @thanh_on
The "Daily Data Volume" in this case is the amount of daily ingest.
You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing
Or by running the following search:
index=_internal
[ rest splunk_server=local /services/server/info
| return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d
| eval _time=_time - 43200
| bin _time span=1d
| stats latest(b) AS b by slave, pool, _time
| timechart span=1d sum(b) AS "volume" fixedrange=false
| fields - _timediff
| foreach "*"
[ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing