Getting Data In

customizing the indexes.conf to keep no cold data

Communicator

Our requirement is that there is no cold data. Once the data comes in it will be keep warm for 90 days and then it will be moved to frozen directory. We have done sizing and we have 3.2TB for warm volume and 3.8TB for frozen volume for all indexes. indexing is 60GB/day.

here is my stanza :

[firewall]
homePath = volume:primary/firewall/db
maxHotBuckets = 3
maxTotalDataSizeMB = 204800
enableDataIntegrityControl = 0
enableTsidxReduction = 0
maxWarmDBCount = 300
coldPath = volume:primary/firewall/colddb
frozenTimePeriodInSecs = 7776000
coldToFrozenDir = "/splunk_frozen/frozen_logs/firewall"
thawedPath = $SPLUNK_DB/firewall/thaweddb

do i still need to define coldDB address ?
my coldToFrozenDir is giving error where as the directory exists with 'wr' permissions.

what is the best approach to achieve this.

SplunkTrust
SplunkTrust

Okay, I'm going to be polite about this, but also very clear.

The system of hot to warm to cold to frozen is the way that Splunk works. If you decide to arbitrarily, for an analogy, remove the intake manifold from the carburation system for your `67 Chevy, then something will happen. It may be okay, it may not. Your car may burn up. More likely, if you do it carefully and with all due consideration, what will happen is that it will mostly work, but there will be occasional glitches in your ignition and you will have no idea precisely what is wrong. Because you made an arbitrary change to a system that you did not design, based upon some arbitrary directive.

If your executives or managers are giving you technical direction to avoid best practices in Splunk, then you should get that direction in writing, and prepare for triaging the inevitable glitches. Most likely, those glitches won't be terrible. Most likely. However, your organization has decided to do something that no one else has done, making a technical change away from best practices for no apparent reason. No one knows what might happen.

Remember, buckets move when the most recent event in them ages out. The process moves the bucket from warm to cold, then waits for them to age out of cold, at which point they are moved to frozen and their indexing removed. This is all handled under the covers. If you skip the intermediate step, then whatever happens under the covers will happen. Or not.

Also, and this may or may not be important to your organization, when you configure something in a way that voids best practices, it also may void some of your support when you run into weird issues. You might occasionally hear a line that sounds like, "We don't know what the effects of that strange thing you did might be, so we're done here..." Or you might not.

On the other hand, there is no reason that best-practices-compliant cold storage could not be arbitrarily short and arbitrarily small, within reason. Just allocate 100 Gig to cold for that one day of transit. In essence, you could give cold storage a 1-day grace period in which it can do whatever it is designed to do on the way from warm to frozen, and thereby avoid any possible weird issues from doing something that hasn't been well thought out based on ignorance of the actual engineering of Splunk under that manifold.

SplunkTrust
SplunkTrust

There is a warning that appears when a bucket rolls directly from hot or warm to frozen.

So long as you don’t care of the warning, it should be fine...

But the warning is there for a reason. It indicates you’re not effectively using your storage.

So long as the op is ok with not using cold, this is perfectly fine imho.

However, you have to specify something for coldPath regardless or splunk won’t start. You dont want that path to be small in size though because that could derail splunk if it needs to write a cold bucket for whatever reason AND it runs out of disk along the way.

So I say put coldPath on same volume as hotPath and set frozenTimePeriodInSecs to 100 days...

SplunkTrust
SplunkTrust

Yes, @jkat54 , they could certainly put them in the same place if they chose. Personally, I'd tend to put it elsewhere, with enough space to allow the system to move stuff in and out as per normal operations, but that's a nit. The point is, don't blaze unnecessary trails unless you have a good reason. Out there be snakes and dragons and chiggers.

0 Karma

SplunkTrust
SplunkTrust

(Please note - I started writing this but it started becoming rather silly, and it ended up making me laugh so I finished it up in that vein. I think it still There IS a point to it, but please take it in the manner it was intended - a more or less humorous take on the situation. Or at least one that started mostly serious but ended up rather silly. 🙂

While I'm not trying to convince you that you have a business rule that's pretty silly, but you have a business rule that codifies an arbitrary technical distinction and is pretty silly.

In my $job-1, we had hot, warm and cold and they all lived on the same disks, all on SSD. We managed volumes in such a way that all data went against limits in total. In that case, there's no obvious reason how anyone would even know there was "cold". It acted like warm, it worked like warm, it performed like warm... It was an arbitrary distinction to call it "cold", and indeed my brain generally equates the two as synonyms.

Hot, obviously, was different because it was being written to, but the other two in the middle are essentially a disk management tool, allowing you to manage performance characteristics separately. The point being that you didn't have to have them managed differently and if you didn't there's no distinction between them. Also if they live on the same volume, rolling warm to cold is just an inode change and is practically instant.

My suggestion is to ask whoever make this decree from above to recreate this requirement as a "what we want to have happen" (nothing searchable on slow disk? I honestly can't imagine what their goal for this dictate is) and not "how we want you to do it" (though shalt not type the word 'cold' into the configurations).

Find out the reasons why they want no "cold", what they think that means, what happened in the past that made them have an irrational fear of "cold". Also ask what if tomorrow Splunk 7.1.4 came out and it changed all the terms to "boiling", "tepid", "chilled" and "calcified?" Would they then rewrite business rules to change it to "No Chilled Data! It's evil! Only Boiling, Tepid and Calcified!" Or would they think that's silly?

Or maybe you can mock up one of the Splunk docs pages and change all the "cold" to "rutabaga" and print it out and show it to them (management loves glossy printouts) and get them to rewrite their requirements that "There shall be no rutabagas in Splunk's storage" and then you can totally say "Gotcha, no rutabagas. I can do that."

Of course, this won't help you now, so feel free to follow the examples given and see if you can make this work.

But if you can't, IMO, just don't tell anyone rutabagas exists. "Nope, no rutabagas here. Nothing to see. Move along." Then show them, oh, I don't know, a pretty Trellis of some or another data or maybe the TARDIS visualization - "Look! Pretty!" Then maybe they'll let working rutabagas continue working, never the wiser.

SplunkTrust
SplunkTrust

You'd need coldPath for sure. Since you're using a separate volume for cold buckets, you can just set maxVolumeDataSizeMB for it to 1 (1 MB), so that almost any bucket that rolls from warm to cold, it will immediately roll to frozen (will roll pretty soon).

0 Karma

Communicator

We are using the same volume /volume:primary/* for hot, warm and cold (homepath and coldPath).

Our requirement to freeze the data is in time.

please suggest me on this ::

[firewall]
homePath = volume:primary/firewall/db
coldPath = volume:primary/firewall/colddb
thawedPath = $SPLUNK_DB/firewall/thaweddb
maxHotSpanSecs = 90 days (which is also default value)
frozenTimePeriodInSecs = 100days
coldToFrozenDir = "/splunk_frozen"

how is it.

0 Karma

SplunkTrust
SplunkTrust

You can’t put 90 days.

The frozenTimePeriodInSecs is not called “InDays” for that very reason.

0 Karma

SplunkTrust
SplunkTrust

hello there,
to your question,
you have to define cold path to create an index.
you can however create a very small cold volume and point the cold path to that volume...
not sure however why would you need or want to skip it...
can you elaborate on the requirement? why skip cold? what is the reason behind it?

0 Karma

SplunkTrust
SplunkTrust

This would be the way to go.. From my understanding of how Splunk rolls data, you must have a cold path, so a workaround would be to make a very small retention period for the cold data so it rolls off quickly

0 Karma

Communicator

Ok. so business requirement is the reason to keep only hot and warm data. That's it.

0 Karma

Champion

To not retain cold data, you can define a warmToColdScript for each index that simply deletes the buckets or archive them.

0 Karma