I know how to get my indexing volume per index. Here's what I use.
index="_internal" source="*metrics.log" per_index_thruput | eval GB=kb/(1024*1024) | stats sum(GB) as total by series date_mday | sort total | fields + date_mday,series,total | reverse
I know how to profile an index by what days it contains, what its size is, etc.
| dbinspect timeformat="%s" index=INDEXNAME
| rename state as category
| stats min(earliestTime) as earliestTime max(latestTime) as latestTime sum(sizeOnDiskMB) as MB by category
| convert timeformat="%m/%d/%Y" ctime(earliestTime) as earliestTime ctime(latestTime) as latestTime
What I really need to know is the "conversion rate". In other words, I have 5gb hitting a given index per day, but how much space does that take on disk? It's hard to get that with "df" because the indexes are always changing in size, databases are moving along the hot->warm->cold->frozen train, and so on.
I've done a bit of digging on this myself, comparing the uncompressed size of the raw data (gzip -dc warm_bucket/rawdata/journal.gz | wc) to the total size of the bucket (du -s), and been able to estimate some per-bucket compression ratios. It's not the easiest answer to get to, since it involves direct shell access, traversing many buckets, and may not be accurate for hot buckets, but it's the best I've been able to come up with. HTH.