What is the best method for gauging the amount of data a log source feeds in? for example, let the system send data to the indexer(s) for a set period of time i.e. 24 hours ?

That's the method I use. Depending on the source, you may not need to leave it running for 24 hours as sometimes an hour or two is enough for an estimate.

