sorry if this has been asked elsewhere -
I need to be able to monitor the integrity of the buckets - and detect 'irregular behavior' outside of when a warm or cold bucket is rolled to the next age group. The question is essentially, what is regular behavior, from the perspective of the OS?
E.g. will Splunk create a cold bucket with roughly the same name as it's warm pre-cursor, then delete the warm bucket? Or does it simply rename the warm bucket and move it (and contained tsidx files) to the cold directory?
Or perhaps something more granular, along the lines of modifying the actual tsidx files and then dealing with the bucket/directory?
Rule #1 - once a bucket becomes warm (or cold), it becomes largely read-only. The bucket won't have more data added to it. There could be some corner cases around this, like if Splunk has to do a bucket fsck and repair. You would probably best off to only try to check integrity on the rawdata files. These contain your actual data whereas the tsidx and other files in a bucket are mostly just pointers into your rawdata.
Rule #2 - when a bucket moves from hot to warm to cold, the name of the directory (and its parent) are really all that changes about that bucket. The content of the bucket itself does not say "I'm warm", but rather the location of its subdirectory on disk and the name of its subdirectory.
I suspect this will be tricky because most FIM products don't understand the concept of a rename of a file or of its parent directory. You can probably do very good checking against cold buckets, but warm and hot will bring struggles.
OH! You mean like tripwire sorts of things.
The "truth" for a splunk bucket are the files in bucket/rawdata/journal.gz and potentially any additional files named bucket/rawdata/<integer>
These files should not change after the directory has rolled to warm, unless something very unusual happens for Index Replication/Clustering like we realize the bucket we have is totally wrong, and we get a new copy from another peer.
I don't really have experience with FIM (File Integrity monitoring?) but I can tell you the behavior and the expected contract.
When buckets move from warm to cold there are two cases:
Case 1: warm and cold are on the same filesystem.
Unless you are making use of a warmToColdScript (very rarely a good idea), the directory is renamed from the warm location to the cold location so should atomically appear to disappear from the warm location and appear in the cold location. The directory will of course have the same name and contents in this case. Scenarios where the contents might change are essentially independent of this move.
Case 2: warm and cold are on different filesystems.
Again this is the story when not using a script. If using a script the script is responsible for the entire process and we have no idea what happens.
Here is the normal case: A directory is created in the cold location with the same name and the suffix -inflight. We copy the contents of the source directory into the target -inflight. When this is complete we rename the target -inflight to the conventional name in the cold location. Then we rename the source directory to have the -inflight suffix. Then we delete the directory from the source location. Most of these steps will back out if there is any error.
The CONTRACT is that the bucket name appears in the target (cold) location either at the same time or prior to it disappearing from the source (warm) location. This is what we will continue to do. The other details such as -inflight may change.
There are scenarios not covered by this answer, such as buckets being created in cold in some versions of Index Replication / Clustering (eg 5.x). However those scenarios do not violate the principle of warm buckets transitioning to cold. There could also be situations where manual administrator recovery results in other stories with buckets appearing, though typically restoring buckets from backup would place them in the 'thawed' location.
Also you must realize that the warm state is not guaranteed to last any significant time period, so there may be cases what a warm bucket is more or less instantly moved to cold. Additionally, you can also configure your retention policies so that a bucket may go directly from warm to frozen without ever being moved to cold.
If you are trying to build constraints around the CONTENTS of a bucket, it will be a tough road. If Splunk believes a bucket has become corrupt it will attempt a rebuild regardless of the storage location. If a Splunk administrator performs a
|delete action, then new content will be added to the buckets. There could be cases where bloom filters are added to buckets on administrative action (e.g. bloom filter config gets altered), or when an explicit check and potential rebuild are requested by administrators as maintenance or recovery actions (e.g. the command splunk fsck). Additionally Index Replication/Clustering must make buckets searchable on demand if other nodes in the system go offline.
Thanks folks. If I could 'accept' both answers I would 😉