Getting Data In

Index Growth

npatellis
Explorer

Is it possible to trend the growth of indexes? We are trying to project future storage needs and need to be able to trend the growth of indexes. Any ideas?

Tags (2)

npatellis
Explorer

Thanks. Good work around for a feature which I assumed would be included.

I was hoping that the Splunk development team would recognize the need to measure index growth via builtin functionality!

bbingham
Builder

You could quickly setup a scripted input that runs something as easy as "df -hk" on your files system. Only bad part, that eats up license usage. If you'd like more of a rough guess, and don't want to use your license, you could use the deployment monitor app, which watches and summarizes the metrics log. This will allow you to see the amount of data you are indexing daily and gives some nice graphs that you could roughly see what's going on.

If you're looking for more of a trend based on tangible data, I can tell you in our environment we see about a 47% compression rate on the data. You could then use the eval command for some quick math to pull the source data size and graph a rough estimate on disk size.

index="_internal" source="*metrics.log" per_index_thruput | timechart span=1d sum(kb) as total | streamstats sum(total) as RollingTotal | eval TotalDiskUsage=((RollingTotal/1024)*.46)

this will pull the total index throughput, group the traffic by day, adds the throughput together, divides by 1024 to get MB instead of KB, and then multiplies by the 46% compression rate we see in our environment. It's a quick down and dirty way to get a close estimate of growth over time.

Here's a sample of what that search will do:

                       _time      total     RollingTotal    TotalDiskUsage 
1   4/19/11 12:00:00.000 AM 1036183.974615  1036183.974615  470
2   4/20/11 12:00:00.000 AM 10996271.902688 12032455.877303 5400
3   4/21/11 12:00:00.000 AM 10385829.949624 22418285.826927 10000
4   4/22/11 12:00:00.000 AM 10058124.908306 32476410.735233 15000
5   4/23/11 12:00:00.000 AM 8189837.568376  40666248.303609 18000   
6   4/24/11 12:00:00.000 AM 8615154.066561  49281402.370170 22000
7   4/25/11 12:00:00.000 AM 16328828.575640 65610230.945810 29000
8   4/26/11 12:00:00.000 AM 14284598.373282 79894829.319092 36000

in the last 8 days, as a rough estimate, I've used up 36GB of actual disk, you could quickly figure out your avg from there or the change by appending the search with:

| delta TotalDiskUsage as Difference | stats avg(Difference)

If you want to be conservitive, you could always change the compression ratio down to something like 10%. If you'd like to exclude the splunk internal index size, use:

  index="_internal" source="*metrics.log" per_index_thruput NOT series="_*"| timechart span=1d sum(kb) as total | streamstats sum(total) as RollingTotal | eval TotalDiskUsage=((RollingTotal/1024)*.46) | delta TotalDiskUsage as Difference

Hope this helps!

Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...