Eeh, I think that the metadata regarding which sourcetypes exist where is on a more granular level - in the index buckets. That is where you find files like Hosts.data, Sources.data and SourceTypes.data.
From that I assume that you'd need to open every bucket in every index in order to determine if the sourcetype is present in that index.
However, if you can instruct your users to type
index=blaha instead of
sourcetype=blaha, you would get a performance boost. How much will depend on other factors - one of the most important will be the time range for the query.
Still I would no recommend you to do this, since you'd have to micromanage all the disk usage for those indexes. Unless you have a really strong use case regarding the access restrictions.
You can create as many as you want, however more indexes do not mean better performance. If you keep your data in many different indexes it's rather the opposite, as if you don't specify a specific index in your search Splunk will need to open each index to check if events that you're searching for are in there.
Dividing up data across several indexes is not something you do for performance reasons, rather it's something you do if you want either different periods for how long data will be kept, or different access permissions (for instance user A is allowed to access index X but not index Y, whereas user B is allowed to access index Y but not index X).
Ok, let's say we want to add an index for each sourcetype (a 1 to 1 index/sourcetype ratio) to allow for the most granular security rules for access to events.
Let's assume I define a maximum of 100 indexes.
Assuming a user always specifies a sourcetype in their searches, will Splunk still check each index for that sourcetype on every search?
Does it just check some metadata field for each index rather than search through all events in an index?
Is the performance impact significant?
first keep in mind that you can also segregate data access using, besides indexes, also a search term restriction for each role.
Second, if you split your indexes to keep just one sourcetype, it's not enough specify "sourcetype=x", but for better performances, also add "index=X". otherwise, as Ayn already wrote, your search will cause Splunk to check in ANY other index if the terms are present in that index. And it takes some time as well.
Moreover, keep in mind the kind of searches you'll have to make. Are you just searching ONE sourcetype at time? No cross sourcetype correlation? In that case, again, having data split in several indexes is not the best case.
This means that even if you specify the sourcetype, two factors will impact performance:
Yes and no - Splunk will not initially check the actual raw events in the index, but it will check the lexicon which is a data structure within each bucket in an index. The lexicon holds metadata about the events the index contains, such as any fields set at index-time. Splunk will check the lexicon in each bucket that falls within the selected timerange for your search.