Deployment Architecture

Rolling Hot Data to to Cold quicker

tbalouch
Path Finder

Hey Guys,

Below is an small example of my indexes.conf file and it looks like my HOT DB partition is running low on space. It is currently at about 75%. I would like to have this partition roll its Hot Data to Cold faster in order to get below 75$ of disk usage. My cold storage is only at about %10 percent so it would make life easier. Any ideas to which setting that would help me achieve this for my current setup?

[volume:hot]
path = /storage/splunk

12.75TB

maxVolumeDataSizeMB = 12750000

[volume:cold]
path = /storage1/splunk

12.75TB

maxVolumeDataSizeMB = 12750000

Tags (5)
0 Karma

MuS
SplunkTrust
SplunkTrust

Hi tbalouch,

Buckets are rolled from hot to warm if its size reaches a limit set by maxDataSize or its lifetime is older than maxHotSpanSecs (or by using a manual command to roll the buckets). So you can set the maxHotSpanSecs option in indexes.conf as well.

After that buckets will be rolled from warm to cold if the maxWarmDBCount parameter will hit the maximum number of warmdb directories. If this value is exceeded the oldest warm directories will roll to cold on your cold path - the default for this value is 300. This value can be set at the global or per index level.

Read the docs and the wiki about Bucket Rotation And Retention carefully.

Also pay attention to the fact that if your hot/warm buckets are on another disk/filesystem then your cold buckets, you will get some additional IO load on the indexer.

hope this helps ...

cheers, MuS

damode
Motivator

Hi @MuS,

If I want to roll buckets straightaway from hot to cold, would it be advisable to set maxWarmDBCount=0 ? with the below remaining settings for index=main,
maxHotSpanSecs = 2592000 [hot bucket - 30 days]
maxDataSize=750Mb
maxHotIdleSecs = 0
frozenTimePeriodInSecs = 10368000 [120 days]
maxTotalDataSizeMB = 1120000

Thanks,
Deven

0 Karma

MuS
SplunkTrust
SplunkTrust

Surly you could do something like this, but should you do this is the question to answer here ...

  • In your example, you have set maxHotSpanSecs to 30 days this means in a worst case scenario you can loose 30 days worth of data from your hot buckets - can your business accept that risk?
  • The option maxDataSize is set to the default of 750Mb, does 30 days of data fit in there?
  • Add many more questions to answer here .....

It is really hard to provide a fit them all solution, because all environments are different, all data is different, all use cases are different, and all business requirements are different.

Sorry that I'm not able to provide the final solution that suits your needs perfectly

cheers, MuS

0 Karma

damode
Motivator

Hi @MuS,

Sorry, I didnt get your point that well. How will I run into the risk of losing data from hot buckets ? Wouldn't it roll to cold bucket, from where I can still search for data that is beyond 30 days old ?

My client requirement is 30 day active and 90 cold storage, based on this, I had set the above retention policy thinking that the data would stay in hot bucket for 30 days and after rolling to cold bucket, it will stay there for 90 days and then to frozen.

If I set bucket size based on Avg indexed data per day for main index to 1Gb, with the below settings,
maxDataSize = 1000
maxHotBuckets= 3
maxWarmDBCount = 31
homePath.maxDataSizeMB = 32000 (data size equivalent of 30 days + extra)
coldPath.maxDataSizeMB = 90000 (data size equivalent of 90 days)
maxTotalDataSizeMB = 122000
frozenTimePeriodInSecs = 10368000

Would my retention policy atleast work better with these settings ?

Thanks,
Dev

0 Karma

MuS
SplunkTrust
SplunkTrust

Hot buckets are open files which Splunk is writing to and I'm sure you know the risks of open files if let's say the server crashes ... there is the potential data loos.

If you have an average 1Gb per day then is is more like this for indexes.conf:

maxDataSize = 250
maxHotBuckets = 4
maxWarmDBCount = 116

the remaining options look okay. You could also stick to your options but still need to change maxHotBuckets = 3 to be maxHotBuckets = 1. But I saw you found this answer https://answers.splunk.com/answers/205125/are-there-any-search-and-performance-pitfalls-with.html as well and know about the risk associated to set maxHotBuckets = 1

Another question, is the hot/warm and cold storage on different storage tiers? If not, the split actually does not make lot sense - again just my 2 cents here and I hope this makes a bit more sense now 😉

cheers, MuS

damode
Motivator

Thanks, MuS!

Yes, you are right, since buckets are on same storage, splitting doesn't make much sense.
I am just gonna focus more on maxTotalDataSizeMB.
Sorry, another question. maybe a silly one.
Should maxTotalDataSizeMB value be decided on uncompressed data size or indexed data ?

0 Karma

tbalouch
Path Finder

Anyone have any idea?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.