I've been asked to find historical index volume information, going back 5 years, to make projections for future infrastructure and license needs.
_internal is of no use, because it's cleared after 30 days. We track disk space, and I can find the disk space info for the Cold bucket on the indexer, but it's set to roll off after 60 days so that's out as well.
I understand that anything like that would be slightly lower than the actual, as there are several indexes that whose index info would have rolled off, but I'm just trying to find a rough base line to track program growth.
This is pretty far beyond my SPL abilities, so I would be grateful for any help!
I don't quite understand you. It seems you're asking us how to check how much disk space your Splunk used some time ago when you have no data to check it. Sorry, it's as good as guessing from the stars position or wind direction.
What you can do with the data at hand is to check how much data each of your indexes occupies, see how it relates to the period covered by the events in those indexes and calculate how much your indexes use per day. You have to do the same with any accelerated datamodels since as @gcusello mentioned they also use up quite a lot of space (but the good news is that you usually cover only limited timespan with the accelerated summaries).
The metadata is gone, but the indexed data is still there. I'm contractually obligated to maintain the last 5 years worth. I was just wondering if there was a way to look at the old data and see if I can figure out how big it was back then.
Apologies being confusing. I was having a hard time putting that request to words.
Ahhh. So you do have the data itself you just want to know how much of it falls into a specific timerange?
You can use the dbinspect command to list buckets. From them you can chose some subset by date and sum their size.
But it will not tell you how much space your datamodels summaries used to occupy.
Hi @jessieb_83,
as you can see at http://splunk-sizing.appspot.com/#sf=1 you should plan a capacity plan that needs as inputs the following data:
For your calculation, think that usually the real occupation of your data is:
In this way you can calculate without Cluster:
If you have a Cluster you have to moltiplicate for the Replication Factior.
In addition, if you have DataModels, you have to add to the Hot_Warm Data:
for the Data Model Data of one year.
Anyway, the best approach is to involve in the Architecture Design at least a Splunk Architect, or a Splunk Professional Services People.
Ciao.
Giuseppe
I had not seen this before. It will be very helpful in our future planning. Thank you!
I don't think SPL is your problem - you will only be able to go back as far as your longest retention period (6 months?).
For the normal indexes ( not including _internal or the Drive monitor index) our retention policy is contractually specified to 5 years. I just want to see if we can extrapolate from what exists, how big it was back then.