Monitoring Splunk

How to find historical index volumes?

jessieb_83
Path Finder

I've been asked to find historical index volume information,  going back 5 years, to make projections for future infrastructure and license needs.

_internal is of no use, because it's cleared after 30 days. We track disk space, and I can find the disk space info for the Cold bucket on the indexer, but it's set to roll off after 60 days so that's out as well.

I understand that anything like that would be slightly lower than the actual, as there are several indexes that whose index info would have rolled off, but I'm just trying to find a rough base line to track program growth.

This is pretty far beyond my SPL abilities, so I would be grateful for any help!

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't quite understand you. It seems you're asking us how to check how much disk space your Splunk used some time ago when you have no data to check it. Sorry, it's as good as guessing from the stars position or wind direction.

What you can do with the data at hand is to check how much data each of your indexes occupies, see how it relates to the period covered by the events in those indexes and calculate how much your indexes use per day. You have to do the same with any accelerated datamodels since as @gcusello mentioned they also use up quite a lot of space (but the good news is that you usually cover only limited timespan with the accelerated summaries).

0 Karma

jessieb_83
Path Finder

The metadata is gone, but the indexed data is still there. I'm contractually obligated to maintain the last 5 years worth. I was just wondering if there was a way to look at the old data and see if I can figure out how big it was back then.

Apologies being confusing. I was having a hard time putting that request to words. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ahhh. So you do have the data itself you just want to know how much of it falls into a specific timerange?

You can use the dbinspect command to list buckets. From them you can chose some subset by date and sum their size.

But it will not tell you how much space your datamodels summaries used to occupy.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jessieb_83,

as you can see at http://splunk-sizing.appspot.com/#sf=1 you should plan a capacity plan that needs as inputs the following data:

  • average of daily data ingestion (usually I take the license value to be more sure),
  • retention of hot and warm data (usually on month but it depends on the most frequent searches),
  • full retention of your data,
  • Number of Indexers,
  • if you have an Indexer Cluster: Replication Factor.

For your calculation, think that usually the real occupation of your data is:

  • row data: 15% of the original data,
  • indexes data: 35% of the original data.

In this way you can calculate without Cluster:

  • Hot + Warm Data: License * 0.5 * Hot_Warm_Retention / Indexers,
  • Cold Data: License * 0.5 * Cold_Retention / Indexers.

If you have a Cluster you have to moltiplicate for the Replication Factior.

In addition, if you have DataModels, you have to add to the Hot_Warm Data:

  • License * 3.4

for the Data Model Data of one year.

Anyway, the best approach is to involve in the Architecture Design at least a Splunk Architect, or a Splunk Professional Services People.

Ciao.

Giuseppe

jessieb_83
Path Finder

I had not seen this before. It will be very helpful in our future planning. Thank you!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I don't think SPL is your problem - you will only be able to go back as far as your longest retention period (6 months?).

0 Karma

jessieb_83
Path Finder

For the normal indexes ( not including _internal or the Drive monitor index) our retention policy is contractually specified to 5 years. I just want to see if we can extrapolate from what exists, how big it was back then.

0 Karma
Get Updates on the Splunk Community!

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...

Thank You for Celebrating CX Day with Splunk!

Yesterday the entire team at Splunk + Cisco joined the global celebration of CX Day - celebrating our ...