Getting Data In

How to calculate Daily Data Volume on Splunk Clustering

thanh_on
Path Finder

Dear everyone,

I have a Splunk Clustering (2 indexers) with:
Replication Factor=2
Searchable Factor=2

I supposed to sizing a index A on indexes.conf. Then, I found this useful website: https://splunk-sizing.soclib.net/

My concern on this website is how to calculate "Daily Data Volume" (average uncompressed raw data).

thanh_on_0-1748427735918.png


So, how can I calculate this ? Can I use a SPL command on Search Head to calculate this ?

Thanks & best regards.

Labels (1)
0 Karma
1 Solution

livehybrid
Super Champion

Hi @thanh_on 

The "Daily Data Volume" in this case is the amount of daily ingest. 

You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing

Or by running the following search:

index=_internal 
    [ rest splunk_server=local /services/server/info 
    | return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d 
| eval _time=_time - 43200 
| bin _time span=1d 
| stats latest(b) AS b by slave, pool, _time 
| timechart span=1d sum(b) AS "volume" fixedrange=false 
| fields - _timediff 
| foreach "*" 
    [ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

View solution in original post

thanh_on
Path Finder

Hi @livehybrid ,

Thank you for your anwser,

FYI, I have separated the index for each device or vendor because each device has different data retention policies.

Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf

For example:

[idx_fgt]

(180 days searchable)

[idx_windows]

(365 days searchable)

Can I use *license_usage.log* by each index for this situation ?


Thanks & best regards.

0 Karma

Prewin27
Communicator

@thanh_on 
Yes, you can use license_usage.log to calculate daily data volume per index.

simple query to check by index.

index=_internal source=*license_usage.log type="Usage" idx=* | timechart span=1d sum(b) as bytes by idx | eval GB=round(bytes/1024/1024/1024, 2)

PickleRick
SplunkTrust
SplunkTrust

Also remember that you should not have your environment sized "tightly" - RF and SF should be smaller than your number of indexers. Otherwise your cluster will not be able to rebalance data in case of indexer failure.

gcusello
SplunkTrust
SplunkTrust

Hi @thanh_on ,

you can know this viewing the license consuption for each day, that's the total indexeing volume of all the day in all the Indexers of the Cluster.

Ciao.

Giuseppe

thanh_on
Path Finder

Hi @gcusello ,

Thank you for your anwser,

FYI, I have separated the index for each device or vendor because each device has different data retention policies.

Due to it, I need to calculate "Daily Data Volume" and configure stanza for each index on indexes.conf

For example:

[idx_fgt]

(180 days searchable)

[idx_windows]

(365 days searchable)

 

Do you have any suggestion ?

 

Thanks & best regards.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @thanh_on ,

different data retentions is one of the most reasons to have different indexes.

In this case, in indexes.conf, you have to define for each index you created the frozenTimePeriodInSecs option in indexes.conf.

In your cases, 180 days is 15552000 seconds and 365 days is 31536000 seconds:

[idx_fgt]
<other settings>
frozenTimePeriodInSecs = 15552000

[idx_windows]
<other settings>
frozenTimePeriodInSecs = 31536000

Passed this period, data can be deleted or moved offline (copied in a different location).

Remember that the retention policies are applied at index level on buckets, in other words, you could have data that exceed data retention because they are in a bucket where there is at least one event with timestamp inside the retention period.

When the earliest event exceed the retention period the bucket is deleted or moved offline.

Ciao.

Giuseppe

0 Karma

thanh_on
Path Finder

Dear @gcusello 

Thanks you for your advice,

I think frozenTimePeriodInSecs is not enough,

We need to define 
homePatch.maxDatasizeMB, coldPath.maxDataSizeMB, maxTotalsizeMB. Then summary all Index capacity to define disk capacity for our retention polices.

For example below:

[idx_fgt]
<other settings>
homePath.maxDataSizeMB = 101200 # ~100GB
coldPath.maxDataSizeMB = 256000 # 250GB
maxTotalDataSizeMB = 357200
frozenTimePeriodInSecs = 15552000

[idx_windows]
<other settings>
homePath.maxDataSizeMB = 201200 # ~200GB
coldPath.maxDataSizeMB = 356000 # ~350GB
maxTotalDataSizeMB = 557200
frozenTimePeriodInSecs = 31536000

 
Summary [idx_fgt] and [idx_windows] we got for each indexer instance:

~300GB capacity for volume ../hot_warm/
~600GB capacity for volume ../cold/

Our final goal is calculate addtitional capacity for disks on indexer instance. That's why in the title, we need to calculate Daily Data Volume 😄

More any suggestion from you ?

Thanks & best regards.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @thanh_on ,

yes, obviously: I hinted only for the retention period.

Only one hint:

don't define the capacity of each index, but create a volume that will contain all your indexes and define the max volume dimension.

In this way, you can dynamicall manage indexes dimensions.

For volume creation and configuration see at https://docs.splunk.com/Documentation/Splunk/9.4.2/Admin/Indexesconf#indexes.conf.spec

This is an example:

### This example demonstrates the use of volumes ###

# volume definitions; prefixed with "volume:"

[volume:hot1]
path = /mnt/fast_disk
maxVolumeDataSizeMB = 100000

[volume:cold1]
path = /mnt/big_disk
# maxVolumeDataSizeMB not specified: no data size limitation on top of the
# existing ones

[volume:cold2]
path = /mnt/big_disk2
maxVolumeDataSizeMB = 1000000

# index definitions

[idx1]
homePath = volume:hot1/idx1
coldPath = volume:cold1/idx1

# thawedPath must be specified, and cannot use volume: syntax
# choose a location convenient for reconstitition from archive goals
# For many sites, this may never be used.
thawedPath = $SPLUNK_DB/idx1/thaweddb

[idx2]
# note that the specific indexes must take care to avoid collisions
homePath = volume:hot1/idx2
coldPath = volume:cold2/idx2
thawedPath = $SPLUNK_DB/idx2/thaweddb

[idx3]
homePath = volume:hot1/idx3
coldPath = volume:cold2/idx3
thawedPath = $SPLUNK_DB/idx3/thaweddb

[idx4]
datatype = metric
homePath = volume:hot1/idx4
coldPath = volume:cold2/idx4
thawedPath = $SPLUNK_DB/idx4/thaweddb
metric.maxHotBuckets = 6
metric.splitByIndexKeys = metric_name

Ciao.

Giuseppe

thanh_on
Path Finder

Dear @gcusello ,

Thank you for your advice,

As your recommnedation, I just do these steps (please fix me if I'm wrong):

1) Using Splunk Clustering Daily License Usage as a Daily Data Volume (Example: 10GB per day)

2) Using Splunk Sizing site to calculate with retention policy requirement

thanh_on_0-1748511519446.png

3) Configuring indexes.conf as your recommendation for each volume

[volume:hotwarm]
path = /mnt/hotwarm_disk
maxVolumeDataSizeMB = 102400 #100G

[volume:cold]
path = /mnt/cold_disk
maxVolumeDataSizeMB = 204,800 #200G

#Frozen Disk: /mnt/frozen_disk is 410G

[idx]
homePath = volume:hotwarm/defaultdb/db
coldPath = volume:cold/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
frozenTimePeriodInSecs = 7776000 #90 days searchable
coldToFrozenDir = /mnt/frozen_disk/defaultdb/frozendb

 
Thanks & Best regards.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Wait a second. Something doesn't add up here. Even ignoring the syntax of that 200MB cold volume limit, if you set hot/warm to 100GB, cold to 200GB you'll get at most 300GB of space. In ideal conditions that's 30*10GB (in reality you need some buffer for acceleration summaries and pushing a filesystem to 100% usage is not a healthy practice anyway) but for your one index for which you've shown the config you have 90 days retention policy. Ok, you wrote that you have multiple indexes with different retention requirements but remember to take them all into account.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @thanh_on ,

let us know if we can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

livehybrid
Super Champion

Hi @thanh_on 

The "Daily Data Volume" in this case is the amount of daily ingest. 

You can get this by going to https://yourSplunkInstance/en-US/manager/system/licensing

Or by running the following search:

index=_internal 
    [ rest splunk_server=local /services/server/info 
    | return host] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d 
| eval _time=_time - 43200 
| bin _time span=1d 
| stats latest(b) AS b by slave, pool, _time 
| timechart span=1d sum(b) AS "volume" fixedrange=false 
| fields - _timediff 
| foreach "*" 
    [ eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...