Deployment Architecture

Splunk data retention policy: What is the best way to reconfigure our site's retention policy to ensure backups are taking place?

jrwebst
Explorer

So I have taken over as the Splunk administrator for Splunk Enterprise from a colleague that has left the business. His setup of Splunk appears to be incorrect, at best. Currently he has so much data set up in the warm buckets that it never rolls over to cold. I am trying to create a data retention policy that will hopefully better allow us to keep our data backed up. We have a file system back that runs every night and I understand that you can not backup the actual /db (hot/warm) buckets of Splunk and you can only backup the colddb directory, correct?

My main goal is that we need to keep at least 1-1.5 years of data on hand at any given time. We only need up to 1 year of that searchable, so the last 1.5 years can be moved to the frozen/archive bucket. However, I am struggling at seeing how to ensure that we roll data into the cold bucket appropriately to ensure our file system backups are taking place. I feel like I am a bit lost when it comes to some of this and am slowly wading through documentation. Is there anything you guys can do to point me in the right direction with some tweaks that I can make to at least get close to this sort of retention policy and get our file system backups on track? I will be around all day if there was some bit of information that I didn't put in here. Any help at all would be greatly appreciated. Thank you so much in advance.

0 Karma
1 Solution

DalJeanis
Legend

Okay, first, don't panic, it's not as bad or as complicated as you might think. There are a lot of ways to skin a cat.

Second, for backup purposes, you don't need to think in terms of "splunk is the entire database". If you are more comfrortable with some other technology, then nothing prevents you from backing up the logs before or after they are indexed. (Well, you can't do it afterwards if the prior guy has set them up to be deleted after indexing, but that's a different discussion.)

Anyway, use the method that you are comfortable with. There is plenty of time to sweep out the prior guy's closets and throw out his bowling trophies later.

Third, as an emergency measure, you can always COPY ALL THE DATA to a summary index and back THAT up. Depending on your licensing and so on, it may take a bit of time to get it done for free, but IT CAN BE DONE. Think in terms, on this forum, of "I am moving (or copying) my data from one server or index to another, how do I do it?" I've seen dozens of articles on that with the precise steps.

Fourth, don't panic, you'll be fine. We're here to help.

View solution in original post

DalJeanis
Legend

Okay, first, don't panic, it's not as bad or as complicated as you might think. There are a lot of ways to skin a cat.

Second, for backup purposes, you don't need to think in terms of "splunk is the entire database". If you are more comfrortable with some other technology, then nothing prevents you from backing up the logs before or after they are indexed. (Well, you can't do it afterwards if the prior guy has set them up to be deleted after indexing, but that's a different discussion.)

Anyway, use the method that you are comfortable with. There is plenty of time to sweep out the prior guy's closets and throw out his bowling trophies later.

Third, as an emergency measure, you can always COPY ALL THE DATA to a summary index and back THAT up. Depending on your licensing and so on, it may take a bit of time to get it done for free, but IT CAN BE DONE. Think in terms, on this forum, of "I am moving (or copying) my data from one server or index to another, how do I do it?" I've seen dozens of articles on that with the precise steps.

Fourth, don't panic, you'll be fine. We're here to help.

jrwebst
Explorer

@DalJeanis Thanks for the encouragement. I thnk I just feel a tad bit overwhelemed with a lot of this, as the whole hot/warm/cold/thawed buckets are slightly foreign.

So from the Wiki LInk: http://wiki.splunk.com/Deploy:BucketRotationAndRetention

I have been able to see most of the variables and settings that I need to set to configure the movements from hot to warm to cold. But I have a couple of questions yet. Once an object is in the "frozen" bucket. What setting drives how long it stays frozen. For instance, if I only want to keep data in Frozen for 6 months and then its deleted.

Also, if I ever decide to rebuild the data and move it from Frozen to thawed, what happens to the thawed data. Does it stay in the thawed bucket indefinitely? Or is it reintroduced back to frozen after a certain amount of time?

Again I apologize if these are relatively noob-like question, I just feel like I have been throw into the deep end here. Thanks again for all of your amazing help.

0 Karma

DalJeanis
Legend

I'm in no way an expert on those things, I'm more of a power user at the moment. However, I can tell you that data rolls forward in units (ie buckets). Your configuration and settings determine how often, both in time and size, splunk closes the hot bucket, labels it "warm", and starts a new hot bucket. Likewise, how often and how fast it slides to cold, then frozen, then gone. A bucket of indexed data doesn't move to a colder status until the last thing in it qualifies, and then doesn't get deleted until the last thing in it qualifies for deletion. Here's a link that describes the process -
http://docs.splunk.com/Documentation/Splunk/6.4.0/Indexer/HowSplunkstoresindexes

You already have the right link for general understanding-
http://wiki.splunk.com/Deploy:BucketRotationAndRetention

Here's an answer with pretty plain description as well -
https://answers.splunk.com/answers/396146/what-is-the-difference-between-frozen-bucket-and-t.html

Autoarchiving is covered here -
http://docs.splunk.com/Documentation/Splunk/6.6.0/Indexer/Automatearchiving

Thawing is covered here - http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Restorearchiveddata
https://answers.splunk.com/answers/433129/what-is-the-proper-way-to-move-buckets-from-cold-t.html

And here's something about refreezing -
https://answers.splunk.com/answers/432509/after-thawing-data-how-do-you-re-freeze-it.html

0 Karma

jrwebst
Explorer

Oh amazing. Thank you so much. Lots of interesting reading.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...