Ok, so let's start that you are ingesting 300 GB/day. As Splunk compresses the raw data that it stores, what that page is saying is that an ingestion of 300 GB/day of logs could be stored on disk at a rate of 150 GB/day or so. (Note, that this compression ratio is a general estimate, and is highly dependent on the type of data you're ingesting as to how well it compresses, and how many terms Splunk pulls out into metadata).
Now let's assume you have a single server, and you just want to store these logs (in Splunk) for 365 days, by default that would be 150GB/day * 365 days -> 53.5 TB of storage at the end of the first year, at which point the first logs will start being frozen based on time (by default - deleted), as new ones come in. If you're wanting to ramp up to that amount... the first month you'll need 150GB/day * 30 days worth of data -> 4.4 TB, during the second month you'll need a total of (150GB/day * 60 days worth of data) -> 8.8 TB total (as you're keeping the first month in addition to the second month) and so on.
If you're not deleting data at the end of your Splunk retention, then you'll need to figure in the amount of time that you keep the data in frozen storage on disk. Splunk itself doesn't manage the lifecycle of data after it has been frozen.
When you introduce features like Indexer Clustering, this gets more complex, as you now store multiple copies of the raw data (replication factor or RF - estimated at 15% of the raw size) and the search metadata (search factor or SF - estimated as 35% of the raw size) across multiple servers, providing you with nice safety guarantees... Let's say you have a cluster with SF=3 and RF=3 and keeping the same amount of data... your 53.5 TB has now turned into 160.4 TB in total disk space... If you have a cluster of 7 indexers this means around 22.9 TB per indexer for the data storage alone.
We can add other features to your environment like Data Model Acceleration, Report Acceleration, and Summary Indexing where Splunk is automatically or on a schedule running searches and gathering statistics about your data so that your searches can leverage those statistics and perform faster/better/for longer time periods on demand at the cost of execution time in off hours, as well as some additional disk space to store the summary data.
There's also a possibility of reducing space required on disk through TSIDX Reduction, where after a certain age Splunk throws away portions of the search metadata, saving disk, at the cost of possibly requiring rebuilding the same metadata during a search whose time period crosses a particular threshold.
If you want to play with some of the basic storage options, there's a tool available at https://splunk-sizing.appspot.com/
But I don't know of any good way to estimate storage requirements for leveraging any of these other features that I've mentioned.
... View more