Solved: How to calculate Daily Data Volume on Splunk Clust...

thanh_on · ‎05-28-2025

Dear everyone,

I have a Splunk Clustering (2 indexers) with:
Replication Factor=2
Searchable Factor=2

I supposed to sizing a index A on indexes.conf. Then, I found this useful website: https://splunk-sizing.soclib.net/

My concern on this website is how to calculate "Daily Data Volume" (average uncompressed raw data).

So, how can I calculate this ? Can I use a SPL command on Search Head to calculate this ?

Thanks & best regards.

livehybrid · ‎05-28-2025

Hi @thanh_on

The "Daily Data Volume" in this case is the amount of daily ingest.

You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing

Or by running the following search:

index=_internal 
    [ rest splunk_server=local /services/server/info 
    | return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d 
| eval _time=_time - 43200 
| bin _time span=1d 
| stats latest(b) AS b by slave, pool, _time 
| timechart span=1d sum(b) AS "volume" fixedrange=false 
| fields - _timediff 
| foreach "*" 
    [ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

View solution in original post

thanh_on · ‎05-28-2025

Hi @livehybrid ,

Thank you for your anwser,

FYI, I have separated the index for each device or vendor because each device has different data retention policies.

Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf

For example:

[idx_fgt]

(180 days searchable)

[idx_windows]

(365 days searchable)

Can I use *license_usage.log* by each index for this situation ?

Thanks & best regards.

PrewinThomas · ‎05-28-2025

@thanh_on
Yes, you can use license_usage.log to calculate daily data volume per index.

simple query to check by index.

index=_internal source=*license_usage.log type="Usage" idx=* | timechart span=1d sum(b) as bytes by idx | eval GB=round(bytes/1024/1024/1024, 2)

PickleRick · ‎05-28-2025

Also remember that you should not have your environment sized "tightly" - RF and SF should be smaller than your number of indexers. Otherwise your cluster will not be able to rebalance data in case of indexer failure.

gcusello · ‎05-28-2025

Hi @thanh_on ,

you can know this viewing the license consuption for each day, that's the total indexeing volume of all the day in all the Indexers of the Cluster.

Ciao.

Giuseppe

thanh_on · ‎05-28-2025

Hi @gcusello ,

Thank you for your anwser,

FYI, I have separated the index for each device or vendor because each device has different data retention policies.

Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf

For example:

[idx_fgt]

(180 days searchable)

[idx_windows]

(365 days searchable)

Do you have any suggestion ?

Thanks & best regards.

gcusello · ‎05-28-2025

Hi @thanh_on ,

different data retentions is one of the most reasons to have different indexes.

In this case, in indexes.conf, you have to define for each index you created the frozenTimePeriodInSecs option in indexes.conf.

In your cases, 180 days is 15552000 seconds and 365 days is 31536000 seconds:

[idx_fgt]
<other settings>
frozenTimePeriodInSecs = 15552000

[idx_windows]
<other settings>
frozenTimePeriodInSecs = 31536000

Passed this period, data can be deleted or moved offline (copied in a different location).

Remember that the retention policies are applied at index level on buckets, in other words, you could have data that exceed data retention because they are in a bucket where there is at least one event with timestamp inside the retention period.

When the earliest event exceed the retention period the bucket is deleted or moved offline.

Ciao.

Giuseppe

thanh_on · ‎05-29-2025

Dear @gcusello

Thanks you for your advice,

I think frozenTimePeriodInSecs is not enough,

We need to define homePatch.maxDatasizeMB, coldPath.maxDataSizeMB, maxTotalsizeMB. Then summary all Index capacity to define disk capacity for our retention polices.

For example below:

[idx_fgt]
<other settings>
homePath.maxDataSizeMB = 101200 # ~100GB
coldPath.maxDataSizeMB = 256000 # 250GB
maxTotalDataSizeMB = 357200
frozenTimePeriodInSecs = 15552000

[idx_windows]
<other settings>
homePath.maxDataSizeMB = 201200 # ~200GB
coldPath.maxDataSizeMB = 356000 # ~350GB
maxTotalDataSizeMB = 557200
frozenTimePeriodInSecs = 31536000

Summary [idx_fgt] and [idx_windows] we got for each indexer instance:

~300GB capacity for volume ../hot_warm/
~600GB capacity for volume ../cold/

Our final goal is calculate addtitional capacity for disks on indexer instance. That's why in the title, we need to calculate Daily Data Volume 😄

More any suggestion from you ?

Thanks & best regards.

gcusello · ‎05-29-2025

Hi @thanh_on ,

yes, obviously: I hinted only for the retention period.

Only one hint:

don't define the capacity of each index, but create a volume that will contain all your indexes and define the max volume dimension.

In this way, you can dynamicall manage indexes dimensions.

For volume creation and configuration see at https://docs.splunk.com/Documentation/Splunk/9.4.2/Admin/Indexesconf#indexes.conf.spec

This is an example:

### This example demonstrates the use of volumes ###

# volume definitions; prefixed with "volume:"

[volume:hot1]
path = /mnt/fast_disk
maxVolumeDataSizeMB = 100000

[volume:cold1]
path = /mnt/big_disk
# maxVolumeDataSizeMB not specified: no data size limitation on top of the
# existing ones

[volume:cold2]
path = /mnt/big_disk2
maxVolumeDataSizeMB = 1000000

# index definitions

[idx1]
homePath = volume:hot1/idx1
coldPath = volume:cold1/idx1

# thawedPath must be specified, and cannot use volume: syntax
# choose a location convenient for reconstitition from archive goals
# For many sites, this may never be used.
thawedPath = $SPLUNK_DB/idx1/thaweddb

[idx2]
# note that the specific indexes must take care to avoid collisions
homePath = volume:hot1/idx2
coldPath = volume:cold2/idx2
thawedPath = $SPLUNK_DB/idx2/thaweddb

[idx3]
homePath = volume:hot1/idx3
coldPath = volume:cold2/idx3
thawedPath = $SPLUNK_DB/idx3/thaweddb

[idx4]
datatype = metric
homePath = volume:hot1/idx4
coldPath = volume:cold2/idx4
thawedPath = $SPLUNK_DB/idx4/thaweddb
metric.maxHotBuckets = 6
metric.splitByIndexKeys = metric_name

Ciao.

Giuseppe

thanh_on · ‎05-29-2025

Dear @gcusello ,

Thank you for your advice,

As your recommnedation, I just do these steps (please fix me if I'm wrong):

1) Using Splunk Clustering Daily License Usage as a Daily Data Volume (Example: 10GB per day)

2) Using Splunk Sizing site to calculate with retention policy requirement

3) Configuring indexes.conf as your recommendation for each volume

[volume:hotwarm]
path = /mnt/hotwarm_disk
maxVolumeDataSizeMB = 102400 #100G

[volume:cold]
path = /mnt/cold_disk
maxVolumeDataSizeMB = 204,800 #200G

#Frozen Disk: /mnt/frozen_disk is 410G

[idx]
homePath = volume:hotwarm/defaultdb/db
coldPath = volume:cold/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
frozenTimePeriodInSecs = 7776000 #90 days searchable
coldToFrozenDir = /mnt/frozen_disk/defaultdb/frozendb

Thanks & Best regards.

PickleRick · ‎05-31-2025

Wait a second. Something doesn't add up here. Even ignoring the syntax of that 200MB cold volume limit, if you set hot/warm to 100GB, cold to 200GB you'll get at most 300GB of space. In ideal conditions that's 30*10GB (in reality you need some buffer for acceleration summaries and pushing a filesystem to 100% usage is not a healthy practice anyway) but for your one index for which you've shown the config you have 90 days retention policy. Ok, you wrote that you have multiple indexes with different retention requirements but remember to take them all into account.

gcusello · ‎05-30-2025

Hi @thanh_on ,

let us know if we can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

livehybrid · ‎05-28-2025

Hi @thanh_on

The "Daily Data Volume" in this case is the amount of daily ingest.

You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing

Or by running the following search:

index=_internal 
    [ rest splunk_server=local /services/server/info 
    | return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d 
| eval _time=_time - 43200 
| bin _time span=1d 
| stats latest(b) AS b by slave, pool, _time 
| timechart span=1d sum(b) AS "volume" fixedrange=false 
| fields - _timediff 
| foreach "*" 
    [ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

How to calculate Daily Data Volume on Splunk Clustering

inputs.conf

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Are you a member of the Splunk Community?

How to calculate Daily Data Volume on Splunk Clustering

inputs.conf

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025