So here's one option. You can see how much data you are indexing for a given time period per sourcetype. The general rule for Splunk disk storage is 1/2 X Indexing X Days. Example: 1/2 X 5 gb X 365 days = 912.5 GB of storage.
index=internal metrics kb series!=* "group=persourcetypethruput" | stats sum(indexed_mb) by series
Another option might be using the dbinspect command:
| dbinspect index=my_index
If you can estimate the percentage of the index your sourcetype takes up, you can can an accurate estimate of the disk usage. Reference: http://docs.splunk.com/Documentation/Splunk/6.1.4/SearchReference/Dbinspect
in 6.1 that didn't seem to work at all for me. i found success with the following:
index=_internal metrics kb group=per_sourcetype_thruput | eval sizeMB = round(kb/1024,2)| stats sum(sizeMB) by series | sort -sum(sizeMB) | rename sum(sizeMB) AS "Size on Disk (MB)"