Hi All,
I have a clustered env, rep factor 3 , search factor 2
.
my current index.conf looks like:
#replicate all indexes
[default]
repFactor=auto
frozenTimePeriodInSecs=31536000
# Hot/warm data
[volume:primary]
path = /data/splunk/hotwarm
maxVolumeDataSizeMB = 500000
[volume:cold]
path = /data/splunk/cold
maxVolumeDataSizeMB = 3000000
[applications]
homePath = volume:primary/applications/db
coldPath = volume:cold/applications/colddb
thawedPath = $SPLUNK_DB/applications/thaweddb
maxWarmDBCount = 100
#Roll to frozen after 3 months
frozenTimePeriodInSecs = 7776000
##Invokes archiving script once rolled to frozen
coldToFrozenScript = *Path to script that archives data to external storage"
[anotherindex]
homePath = volume:primary/anotherindex/db
coldPath = volume:cold/anotherindex/colddb
thawedPath = $SPLUNK_DB/anotherindex/thaweddb
maxWarmDBCount = 100
#Roll to frozen after 3 months
frozenTimePeriodInSecs = 7776000
##Invokes archiving script once rolled to frozen
coldToFrozenScript = *Path to script that archives data to external storage"
I have been tasked to achieve the following.
Hot/Warm: retaind data for 15 days ( as opposed to bucket size currently configured) and then roll to cold automatically.
Cold: Retain data till cold volume size reaches 90 % OR after data ages to 2.5 months (whichever comes first)
Frozen: 9 months
Is the new requirement achievable? Specially the retention in cold, based on the overall size of the volume not size per index but the overall volume size. and also what is the best way to enforce time based retention in hotwarm so the buckets roll over to Cold after 15 days?
Many thanks.
There is no time-constraint configurations on Hot/Warm because it makes no sense for there to be any. This is your fast/expensive disk and you should use all of it and constrain only by size (leaving 5%-10% unallocated for housekeeping). So:
Hot/Warm: Retain for 15 days
is not possible and would be silly anyway.
Cold: Retain data till cold volume size reaches 90 % OR after data ages to 2.5 months (whichever comes first)
is very doable but you did not tell us how big your volume is so you will have to do your own math. I am of the opinion that frozenTimePeriodInSecs
should never be used (unless for compliance reasons and you must) and I never use it. In any case, you know how to do that part. For the sizing, do your math and use the stanza at the bottom.
Frozen: Retain for 9 months
is all up to you; you have to code/employ your own housekeeping policy/scripts because once Splunk freezes, that data is dead to it.
[volume:cold]
path = /data/splunk/cold
maxVolumeDataSizeMB = DoYour90percentMathHere
There is no time-constraint configurations on Hot/Warm because it makes no sense for there to be any. This is your fast/expensive disk and you should use all of it and constrain only by size (leaving 5%-10% unallocated for housekeeping). So:
Hot/Warm: Retain for 15 days
is not possible and would be silly anyway.
Cold: Retain data till cold volume size reaches 90 % OR after data ages to 2.5 months (whichever comes first)
is very doable but you did not tell us how big your volume is so you will have to do your own math. I am of the opinion that frozenTimePeriodInSecs
should never be used (unless for compliance reasons and you must) and I never use it. In any case, you know how to do that part. For the sizing, do your math and use the stanza at the bottom.
Frozen: Retain for 9 months
is all up to you; you have to code/employ your own housekeeping policy/scripts because once Splunk freezes, that data is dead to it.
[volume:cold]
path = /data/splunk/cold
maxVolumeDataSizeMB = DoYour90percentMathHere
Hi Woodcock,
Thanks for your reply. Using all of the disk space before rolling to cold has caused us an issue when splunk stopped indexing data when it hit the minimum free disk space threshold defined in server.conf hence we had to opt for MaxWarmDBCount modification. It would solve my problem if I could roll the bucket without splunk stopping the indexing and without changing the threshhold to 0. I guess splunk would behave the same way if it hits 100% of allocated disk space in Cold as well?
Here is my outpout from Index.conf
'# Hot/warm data
[volume:primary]
path = /data/splunk/hotwarm
maxVolumeDataSizeMB = 500000
'# Cold data
[volume:cold]
path = /data/splunk/cold
maxVolumeDataSizeMB = 3000000
And here is the output from operating system:
Filesystem            1K-blocks     Used               Available   Use%   Mounted on
dev/nvme2n1            515928320   246888244   242809292   51%  /data/splunk/hotwarm
/splunkColdStorage-cold 3096207552 494324772 2444580612 17% /data/splunk/cold
In case i wanted to i change the maxVoumeDataSizeMB =500000 to maxVolumeDataSizeMB=464335488 (which is 90% of the physical volume Hot/warm) will splunk still stop indexing as the physical volume will still hold 10% free space?
Similiarly I could change the parameter in Cold from maxVolumeDataSizeMB = 3000000 to maxVolumeDataSizeMB = 2700000
Many thanks
(leaving 5%-10% unallocated for housekeeping).
This will actually fix my issue, I just am not sure how to enforce this. Do i change the maxVolumeDataSizeMB parameter to 90% of the volume size?
Thanks.
Exactly. That is how you do it. Splunk needs room for housekeeping (it cannot delete files instantly and it needs time to figure out if it needs to delete any and which ones) and SO DOES THE OS! I would never use more than 95% of any disk at any time.
Thanks woodcock.