Solved: Re: Best practice for archiving frozen data

mike_k

I am in the process of pulling together a design for a new Splunk deployment.

The deployment will be on the small side and use a single server deployment running on RHEL.

Looking through the Splunk documentation, I can see two key decisions I need to make regarding archiving old Splunk data:

When rolling data from Cold to Frozen I can use either "coldToFrozenDir" or "coldToFrozenScript" to do the actual copying of data.
Whether cold data is rolled to frozen on either a locally attached drive or to a file share server.

On point 1. above:

Are there advantages to using the custom script over the standard coldtofrozendir? (I'm just looking for a straight-forward data copy option. ).

I believe I've read somewhere that if coldtoFrozenScript fails to copy data then it retries again after several minutes. Is this also the case for the coldtoFrozenDir option ... if it fails to successfully copy it will continue to retry?

On Point 2. above:

For the location of my frozen data, I also have the option of a large separate locally attached drive on the Splunk server (separate to my hot/warm/cold drives) or I could copy it to a remote SMB/NFS file server on the local LAN. In either case I would still be archiving the frozen data to tape over time as well. Are there benefits in copying the data to a locally attached drive over a file server? or is it better just to keep frozen data off the Splunk server entirely?

Thanks in advance.

livehybrid

Hi @mike_k

Regarding the coldToFrozenDir vs coldToFrozenScript, teh primary advantage of coldToFrozenScript is the ability to manipulate the data before archiving. Splunk provides an example Python script ($SPLUNK_HOME/bin/coldToFrozenExample.py) that compresses the buckets by removing the index files and gzipping the raw data. coldToFrozenDir simply moves the uncompressed bucket directory as-is, which consumes significantly more storage space.
If coldToFrozenDir fails (e.g., due to a permissions issue or the destination disk being full), Splunk will continuously retry moving the bucket.

When it comes to Local Drive vs Remote File Share (NFS/SMB) there may be trade-offs to be had here in terms of cost vs reliability. Writing to a locally attached drive (or block storage presented locally) avoids network latency and mount stability issues and I would personally consider high reliability, for a Remote File Share, If a NFS mount drops, hangs, or becomes latent while Splunk is attempting to freeze a bucket then it could cause the bucket-rolling threads to hang and have a knock-on affect through you indexers. CIFS/SMB is generally not recommended or supported for Splunk storage on Linux as is only supported on Windows.

If you are already archiving to tape over time, using a locally attached drive as a staging area for frozen data is probably the safest and most resilient approach.

Its worth nothing that if the frozen destination becomes unavailable or fills up, Splunk may not be able to freeze the buckets and then these buckets will remain in the cold database, eventually causing your cold partition to fill up and could halt indexing.

Remember that you will need to thaw frozen data prior to searching it again in the future, you cannot just copy the frozen files back to the cold DB directory.

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

View solution in original post

mike_k

@livehybrid , thanks for that information. I did not realise that coldToFrozenDir did not remove metadata or compress data it was copying to Frozen. That provides a good incentive to use the script. I'll take a look at the example script.

In terms of local versus remote storage, so if I'm using an NFS share and it hangs it could affect all bucket rolling (so it'll impact hot-to-warm and warm-to-cold bucket rolling, not just cold-to-frozen bucket rolling)? That would definitely provide an incentive to use local storage for frozen.

livehybrid

Hi @mike_k

Regarding the coldToFrozenDir vs coldToFrozenScript, teh primary advantage of coldToFrozenScript is the ability to manipulate the data before archiving. Splunk provides an example Python script ($SPLUNK_HOME/bin/coldToFrozenExample.py) that compresses the buckets by removing the index files and gzipping the raw data. coldToFrozenDir simply moves the uncompressed bucket directory as-is, which consumes significantly more storage space.
If coldToFrozenDir fails (e.g., due to a permissions issue or the destination disk being full), Splunk will continuously retry moving the bucket.

When it comes to Local Drive vs Remote File Share (NFS/SMB) there may be trade-offs to be had here in terms of cost vs reliability. Writing to a locally attached drive (or block storage presented locally) avoids network latency and mount stability issues and I would personally consider high reliability, for a Remote File Share, If a NFS mount drops, hangs, or becomes latent while Splunk is attempting to freeze a bucket then it could cause the bucket-rolling threads to hang and have a knock-on affect through you indexers. CIFS/SMB is generally not recommended or supported for Splunk storage on Linux as is only supported on Windows.

If you are already archiving to tape over time, using a locally attached drive as a staging area for frozen data is probably the safest and most resilient approach.

Its worth nothing that if the frozen destination becomes unavailable or fills up, Splunk may not be able to freeze the buckets and then these buckets will remain in the cold database, eventually causing your cold partition to fill up and could halt indexing.

Remember that you will need to thaw frozen data prior to searching it again in the future, you cannot just copy the frozen files back to the cold DB directory.

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick

Are you sure about that "just copy" policy of coldToFrozenDir? Thawing buckets frozen this way does take quite a lot of effort for indexers so I'd assume freezing does something with them.

Additionally, network file systems (in general, not just NFS) are not recommended due to possible communication/availability issues and such but sometimes are unavoidable. I would strongly recommend to avoid CIFS/SMB if you're using Linux indexers because CIFS client support in Linux is... highly underwhelming. Especially if you have a mounted share and the server goes offline, you're in for a lot of "fun".

There are also two additional factors to take into consideration when thinking about freezing your data.

1. In clustered environment every indexer freezes its own buckets. (and if you're freezing to a shared network directory you can run into naming collisions!). So you might need to do some deduplication to avoid using too much space.

2. If you're using smartstore freezing has additional quirks (don't remember the details though TBH).

mike_k

@PickleRick , so are you saying that you think that using the "coldToFrozenDir" option does remove metadata and compress the raw data during the freezing process?? I have to admit that I have not tested this to see what happens.

isoutamo

As @livehybrid a said it depends on your needs but as you said it’s enough copying buckets to another place then try to keep it KISS and use coldToFrozenDir.
But if you realize that you need to do something else than just move buckets from one plac3 to another then you are needing script version. Of course then you need to write that script by yourself if you cannot fin suitable from somewhere.

Best practice for archiving frozen data

configuration

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Network to App: Observability Unlocked [May & June Series]

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation