Deployment Architecture

Bug or expected behaviour: frozenTimePeriodInSecs reached and bucket frozen before being replicated succesfully to other peers

Engager

I hope someone can shed some light on the freezing of buckets before a bucket could be replicated due to streaming errors:

I have a coldToFrozen script that copies bucket to a location for long term archiving. My scripts are installed on all the cluster peers and is working like expected. As part of testing I've encountered the following edge case and would like some clarity whether this is expected Splunk behaviour.

I have an event generator that generates thousands of events for a single day, written to an index on the cluster. This index has an aggressive rolling period: frozenTimePeriodInSecs = 60. As part of my testing I also restart cluster members (cronjob to call ./splunk restart on each peer) every 15 minutes or so (causing a streaming error on the host 'sending' the original bucket).

What I have encountered is that when a bucket is busy streaming from source to a replication peer AND that destination peer is shut down, hence causing a replication failure AND the frozenTimePeriodInSecs rule for that bucket is reached on source , the source indexer of the bucket will happily freeze the bucket, thus no longer making it eligible for replication.

You will end up with the number of frozen copies for the bucket across the cluster being less than repFactor.

Bear in mind that I'm also assuming that the number of frozen copies of a bucket across the index cluster will always match the repFactor. If the above happens this will not be the case...

Because freezing timeouts are evaluated and executed on each individual peer I suspect the above is normal behaviour and thus an edge case?

The documentation says:
"In the case of an indexer cluster, when a peer freezes a copy of a bucket, it notifies the master. The master then stops doing fix-ups on that bucket. *It operates under the assumption that the other peers will eventually freeze their copies of that bucket as well*."

The issue is that the freeze happens on the originating peer before the bucket can be streamed succesfully to replication peers.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

Hey twigat,

First, let me congratulate you on being thorough and testing things! Lab time, testing and observing can seem like lost arts at times, these days!

I would 100% call this an edge case, in that the rollToFrozen setting, while convenient for your testing, is not realistic for a setting in a cluster, unless you were tying to empty out an index ( tuck that trick away in your toolbox, cause it may come in handy if you ever need to purge a clustered index!).

When a originating peer is streaming it's hot bucket to peer, and gets interrupted, it waits till it rolls from hot to warm to replicate that bucket. This is why you may see this:

https://answers.splunk.com/answers/217020/why-is-cluster-master-reporting-cannot-fix-search.html

Where the CM will report Cannot replicate as bucket hasn't rolled yet.

In your case when you bounce the server, it interrupts a stream, but also rolls buckets from hot to warm. Normally the bucket would replicate at some point as the CM does it's rounds and works it's magic, but before it can, it is frozen. Indexers manage freezing completely independent of the cluster!

Depending on your cluster size and bucket sizes, simply backing frozenTimeInSecs off to 300 or or 900 or heck, even 1440, while still not fit for a real world cluster, should be a more realistic version of what you can expect to see in prod conditions with realistic frozen time settings.

Also keep in mind that rolling duplicated buckets to frozen, is actually taking up more space on your archive than necessary. I would suggest you take a look at Hadoop Data roll, even if just for a look at the Archiver logic. You don't even need to run a full instance of Hadoop, you can just use the hadoop libs to roll to S3! Anyways, I only bring it up because if it's ability to ensure only one replicated bucket makes it to your archive...which usually has it's own replication setups, and is usually the desired outcome for clients.

https://docs.splunk.com/Documentation/Splunk/latest/Indexer/HowHadoopDataRollworks

Anyways, hope this helps! Great job pushing buttons and testing for outcomes!

View solution in original post

Splunk Employee
Splunk Employee

Hey twigat,

First, let me congratulate you on being thorough and testing things! Lab time, testing and observing can seem like lost arts at times, these days!

I would 100% call this an edge case, in that the rollToFrozen setting, while convenient for your testing, is not realistic for a setting in a cluster, unless you were tying to empty out an index ( tuck that trick away in your toolbox, cause it may come in handy if you ever need to purge a clustered index!).

When a originating peer is streaming it's hot bucket to peer, and gets interrupted, it waits till it rolls from hot to warm to replicate that bucket. This is why you may see this:

https://answers.splunk.com/answers/217020/why-is-cluster-master-reporting-cannot-fix-search.html

Where the CM will report Cannot replicate as bucket hasn't rolled yet.

In your case when you bounce the server, it interrupts a stream, but also rolls buckets from hot to warm. Normally the bucket would replicate at some point as the CM does it's rounds and works it's magic, but before it can, it is frozen. Indexers manage freezing completely independent of the cluster!

Depending on your cluster size and bucket sizes, simply backing frozenTimeInSecs off to 300 or or 900 or heck, even 1440, while still not fit for a real world cluster, should be a more realistic version of what you can expect to see in prod conditions with realistic frozen time settings.

Also keep in mind that rolling duplicated buckets to frozen, is actually taking up more space on your archive than necessary. I would suggest you take a look at Hadoop Data roll, even if just for a look at the Archiver logic. You don't even need to run a full instance of Hadoop, you can just use the hadoop libs to roll to S3! Anyways, I only bring it up because if it's ability to ensure only one replicated bucket makes it to your archive...which usually has it's own replication setups, and is usually the desired outcome for clients.

https://docs.splunk.com/Documentation/Splunk/latest/Indexer/HowHadoopDataRollworks

Anyways, hope this helps! Great job pushing buttons and testing for outcomes!

View solution in original post

Engager

Brilliant thanks!

This helps a lot and confirms my suspicions around what is going on. I don't think I'll encounter this edge case in production but I'll account for it none the less just to be safe.

Thanks for the tip regarding archiving. The scripts that I'm testing are for a managed Splunk archiving system I've built in Python. This scripts do the reconciliation of buckets (removing all the replicas/copies for a source bucket - dedup) and ensures only a single master copy is stored thus saving on space. I'm trying to build something 'enterprise grade' as you could run into data loss issues when using ColdToFrozenDir and standard operating system tools to copy buckets or move buckets mid freeze.

Like the following scenario when using a ColdToFrozenDir for an index: Splunk freezes a bucket, copying the bucket to the path specified in ColdToFrozenDir. If your buckets are large and you have an os script/cronjob that copies or move out the buckets to a storage location on a different mount point (very common use case), you run the risk that data could but truncated at the target location due to the script copying a source file being written to by Splunk as it freezes the bucket. The copy will not 'wait' for the write to complete. If it is on the same filesystem you are covered with a move and will end up with a complete file due to the way inodes are handled in Linux (and I suspect other *nixs).

I think that a sure way to prevent this is to stop the indexer before copying out the buckets from the ColdToFrozenDir to your archive location as there will be no files being written to by Splunk and is would be 'safe' when copying to a different filesystem such as an NFS/HDFS share.

The scripts I'm testing provides a safe way to do all of the above while Splunk is running and freezing buckets. The coldToFrozenScript generates lockfiles for each bucket that are checked for by the consolidation scripts (dedup) and bucket moving scripts.

This allows you to manage archiving on a cluster of any size without needing to shut down nodes to guarantee a safe copy. The dedup, coldtofrozenscript and copy scripts also use a modular plug-in system for verification of buckets (full source / destination hash checking), file size, etc. As well as for encrypting, moving, or uploading to S3 etc for buckets after dedupe.

It also has extensive logging so I plan to develop a Splunk app when I have some time to report on bucket health and bucket status throughout the archiving system, metrics such as disk space saved by consolidation/dedup, etc.

The aim is to automate as much as possible and to be modular/flexible.

Thanks for your help mmodestino!

0 Karma