Is it possible to trend the growth of indexes? We are trying to project future storage needs and need to be able to trend the growth of indexes. Any ideas?
Thanks. Good work around for a feature which I assumed would be included.
I was hoping that the Splunk development team would recognize the need to measure index growth via builtin functionality!
You could quickly setup a scripted input that runs something as easy as "df -hk" on your files system. Only bad part, that eats up license usage. If you'd like more of a rough guess, and don't want to use your license, you could use the deployment monitor app, which watches and summarizes the metrics log. This will allow you to see the amount of data you are indexing daily and gives some nice graphs that you could roughly see what's going on.
If you're looking for more of a trend based on tangible data, I can tell you in our environment we see about a 47% compression rate on the data. You could then use the eval command for some quick math to pull the source data size and graph a rough estimate on disk size.
index="_internal" source="*metrics.log" per_index_thruput | timechart span=1d sum(kb) as total | streamstats sum(total) as RollingTotal | eval TotalDiskUsage=((RollingTotal/1024)*.46)
this will pull the total index throughput, group the traffic by day, adds the throughput together, divides by 1024 to get MB instead of KB, and then multiplies by the 46% compression rate we see in our environment. It's a quick down and dirty way to get a close estimate of growth over time.
Here's a sample of what that search will do:
_time total RollingTotal TotalDiskUsage
1 4/19/11 12:00:00.000 AM 1036183.974615 1036183.974615 470
2 4/20/11 12:00:00.000 AM 10996271.902688 12032455.877303 5400
3 4/21/11 12:00:00.000 AM 10385829.949624 22418285.826927 10000
4 4/22/11 12:00:00.000 AM 10058124.908306 32476410.735233 15000
5 4/23/11 12:00:00.000 AM 8189837.568376 40666248.303609 18000
6 4/24/11 12:00:00.000 AM 8615154.066561 49281402.370170 22000
7 4/25/11 12:00:00.000 AM 16328828.575640 65610230.945810 29000
8 4/26/11 12:00:00.000 AM 14284598.373282 79894829.319092 36000
in the last 8 days, as a rough estimate, I've used up 36GB of actual disk, you could quickly figure out your avg from there or the change by appending the search with:
| delta TotalDiskUsage as Difference | stats avg(Difference)
If you want to be conservitive, you could always change the compression ratio down to something like 10%. If you'd like to exclude the splunk internal index size, use:
index="_internal" source="*metrics.log" per_index_thruput NOT series="_*"| timechart span=1d sum(kb) as total | streamstats sum(total) as RollingTotal | eval TotalDiskUsage=((RollingTotal/1024)*.46) | delta TotalDiskUsage as Difference
Hope this helps!