Getting Data In

Splunk indexer tiers - Can we have a second tier for "cold" data only?

nicholasgrabows
Path Finder

We have 10+ splunk indexer cluster. However, the disk capacity on these machines is not large enough to hold all our data for longer than 90 days. We'd prefer 180 days. As such we're looking into several options. Purchasing larger disks and adding SAN storage are two areas we are exploring. Another option is simply to add more machines. As luck would have it we have quite a few spare machines. However these machiens have very limited CPU power and the disks are not very fast. So we are concerned that adding them to the indexer tier might slow down most searches. As an alternative someone suggested adding a second indexer tier. This second tier of low performing servers would hold only cold data, meaning most searches would hit the main indexer tier, but searhces that hit the cold data would enage this secondary tier. It was explained that splunk can handle the movement of hot/warm data from the main tier to cold storage on the secondary tier. Does anyone know if this option is real? If so is there any documentation on the same?

0 Karma
1 Solution

sdvorak_splunk
Splunk Employee
Splunk Employee

So, this is probably a matter of opinion, but if you already have the required amount of horsepower for indexing your data (which I am guessing you do), then adding more indexers seems like a waste (and not sure it would help your immediate problem). Instead, I would recommend using Splunk's ability to place your cold path on different storage (perhaps NFS). And then setup your data archive policy to ensure that data rolls off to cold storage as needed based on free space. Understand that searches that hit cold storage will have potentially lower performance based on the speed of the storage it resides on.
You didn't mention your Splunk version, but I believe you will need to be on 4.2+ for this option to be available to you.

View solution in original post

sowings
Splunk Employee
Splunk Employee

Maybe Shuttl might help?

sdvorak_splunk
Splunk Employee
Splunk Employee

So, this is probably a matter of opinion, but if you already have the required amount of horsepower for indexing your data (which I am guessing you do), then adding more indexers seems like a waste (and not sure it would help your immediate problem). Instead, I would recommend using Splunk's ability to place your cold path on different storage (perhaps NFS). And then setup your data archive policy to ensure that data rolls off to cold storage as needed based on free space. Understand that searches that hit cold storage will have potentially lower performance based on the speed of the storage it resides on.
You didn't mention your Splunk version, but I believe you will need to be on 4.2+ for this option to be available to you.

nicholasgrabows
Path Finder

One option I was considering... would appreciate some thoughts on this.
What if we created a second Index and moved all data older than 90 days into the second Index. Then all searches could just look like "(index=primary OR index=secondary) earliest=-180d blah blah blah". Is it poossible to pin certain Indexes to only certain indexer nodes?

0 Karma

nicholasgrabows
Path Finder

We are using Summary indexing heavily and I guess we could get smarter about the older data... but the use case is a "we're not sure what to look for until the time comes" kind of thing. So SI isn't sufficient in this case.

0 Karma

sowings
Splunk Employee
Splunk Employee

Are you absolutely certain that you'd be accessing that old data past 90 days? You might consider summary indexing for longer scopes (the raw data could be rolled off at that point) but keep only 90 days "live".

0 Karma

nicholasgrabows
Path Finder

sdvorak_splunk, thanks for the response. I'll leave this open for a few more days to see if anyone else has any thoughts.

0 Karma

sdvorak_splunk
Splunk Employee
Splunk Employee

To answer the two questions, you originally asked, I believe the answer is simply, no. I have never seen an architecture that allows for a 2nd tier of indexers. The only thing you could potentially do here is to migrate data to frozen, send the frozen raw data to another set of indexers and reindex the frozen data there. And then you could search across both sets of indexers from your search head. This would be painful to manage, and I wouldn't recommend it.

Alternatively, you can also add additional indexers to your current indexing tier/pool which will spread new data out over the entire indexing tier. But as you previously noted, if you use slower indexers, the performance of your entire indexing tier will be impacted.

But again, the most cost effective and simple solution would be to add more storage to your existing indexers.

nicholasgrabows
Path Finder

Thanks for the possible alternatives. What about the 2 questions posed in the original post?

0 Karma

sdvorak_splunk
Splunk Employee
Splunk Employee

Given your need for fast/reliable access to historical data, I would say, the best solution would be to add more disk to your current indexers.

0 Karma

nicholasgrabows
Path Finder

We worry about the reliability of NAS. Also, we view fast access to the data as critical. Basically we need relatively fast access to a large amount of old data.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

NFS storage would seem the best option here.

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...