Solved: Re: Splunk indexer tiers - Can we have a second ti...

nicholasgrabows · ‎11-14-2012

We have 10+ splunk indexer cluster. However, the disk capacity on these machines is not large enough to hold all our data for longer than 90 days. We'd prefer 180 days. As such we're looking into several options. Purchasing larger disks and adding SAN storage are two areas we are exploring. Another option is simply to add more machines. As luck would have it we have quite a few spare machines. However these machiens have very limited CPU power and the disks are not very fast. So we are concerned that adding them to the indexer tier might slow down most searches. As an alternative someone suggested adding a second indexer tier. This second tier of low performing servers would hold only cold data, meaning most searches would hit the main indexer tier, but searhces that hit the cold data would enage this secondary tier. It was explained that splunk can handle the movement of hot/warm data from the main tier to cold storage on the secondary tier. Does anyone know if this option is real? If so is there any documentation on the same?

sdvorak_splunk · ‎11-14-2012

So, this is probably a matter of opinion, but if you already have the required amount of horsepower for indexing your data (which I am guessing you do), then adding more indexers seems like a waste (and not sure it would help your immediate problem). Instead, I would recommend using Splunk's ability to place your cold path on different storage (perhaps NFS). And then setup your data archive policy to ensure that data rolls off to cold storage as needed based on free space. Understand that searches that hit cold storage will have potentially lower performance based on the speed of the storage it resides on.
You didn't mention your Splunk version, but I believe you will need to be on 4.2+ for this option to be available to you.

View solution in original post

sowings · ‎11-15-2012

Maybe Shuttl might help?

sdvorak_splunk · ‎11-14-2012

So, this is probably a matter of opinion, but if you already have the required amount of horsepower for indexing your data (which I am guessing you do), then adding more indexers seems like a waste (and not sure it would help your immediate problem). Instead, I would recommend using Splunk's ability to place your cold path on different storage (perhaps NFS). And then setup your data archive policy to ensure that data rolls off to cold storage as needed based on free space. Understand that searches that hit cold storage will have potentially lower performance based on the speed of the storage it resides on.
You didn't mention your Splunk version, but I believe you will need to be on 4.2+ for this option to be available to you.

nicholasgrabows · ‎11-15-2012

One option I was considering... would appreciate some thoughts on this.
What if we created a second Index and moved all data older than 90 days into the second Index. Then all searches could just look like "(index=primary OR index=secondary) earliest=-180d blah blah blah". Is it poossible to pin certain Indexes to only certain indexer nodes?

nicholasgrabows · ‎11-15-2012

We are using Summary indexing heavily and I guess we could get smarter about the older data... but the use case is a "we're not sure what to look for until the time comes" kind of thing. So SI isn't sufficient in this case.

sowings · ‎11-15-2012

Are you absolutely certain that you'd be accessing that old data past 90 days? You might consider summary indexing for longer scopes (the raw data could be rolled off at that point) but keep only 90 days "live".

nicholasgrabows · ‎11-15-2012

sdvorak_splunk, thanks for the response. I'll leave this open for a few more days to see if anyone else has any thoughts.

sdvorak_splunk · ‎11-15-2012

To answer the two questions, you originally asked, I believe the answer is simply, no. I have never seen an architecture that allows for a 2nd tier of indexers. The only thing you could potentially do here is to migrate data to frozen, send the frozen raw data to another set of indexers and reindex the frozen data there. And then you could search across both sets of indexers from your search head. This would be painful to manage, and I wouldn't recommend it.

Alternatively, you can also add additional indexers to your current indexing tier/pool which will spread new data out over the entire indexing tier. But as you previously noted, if you use slower indexers, the performance of your entire indexing tier will be impacted.

But again, the most cost effective and simple solution would be to add more storage to your existing indexers.

nicholasgrabows · ‎11-14-2012

Thanks for the possible alternatives. What about the 2 questions posed in the original post?

sdvorak_splunk · ‎11-14-2012

Given your need for fast/reliable access to historical data, I would say, the best solution would be to add more disk to your current indexers.

nicholasgrabows · ‎11-14-2012

We worry about the reliability of NAS. Also, we view fast access to the data as critical. Basically we need relatively fast access to a large amount of old data.

alacercogitatus · ‎11-14-2012

NFS storage would seem the best option here.

Splunk indexer tiers - Can we have a second tier for "cold" data only?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk