Well... the typical approach is to get a piece of paper (or some excel sheet) and list all (kinds of) sources you're gonna be getting the events from. And work with that - think who should have access to that data, how much of that data you're gonna need, for how long it will have to be stored, what are use cases you'll be implementing using this data (because that also can affect the data distribution). It's a bit complicated (and that's why Splunk Certified Architect certification is something you get only after you've already certified for Admin and while the course itself might be taken before that there's not much point in doing so) and it's hard to say in a short post about all possible caveats regarding proper indexes architecture. But bear in mind that indexes in Splunk do not define the data contained within them in any way (at least from the technical point of view). They are just... "sacks" for data thrown in. And you can - if needed - have several different "kinds" of data within one index and usually (unless you have some strange border case) it doesn't matter and doesn't give you a significant performance penalty. You simply choose some of that data when searching by specifying metadata fields (like host, source, sourcetype) as your search terms. One exception here (which I'll not dig into since we're apparently not talking about it at the moment - there are two types of indexes - event indexes and metrics indexes).
... View more