Here is what we have:
8 indexers / 4 search heads / each of them are 24 core, 256GB memory and 7.6TB disk
I am trying to understand which of the following gives a better search performance -
[access permissions and retention period are same (35 days)]
Option-1: Single index, multiple sourcetypes each having data anywhere between 75 to 150GB per day.
Option-2: Single index for each of the sourcetypes that exceed 75GB per day.
Splunk documentation never talks about how big an index can be or when is it ideal to create separate indexes (excluding access permissions and retention periods).
My second question is what is the real harm in having too many indexes? What is the maximum number of indexes you have experienced/worked on a specific splunk installation?
When you create a custom index, the max index size defaults to 500000 MB, but you can decrease or increase this value, as long as you have sufficient storage space on your server.
Aside from access control and varying retention policies, another reason to set up multiple indexes is to speed up search in some circumstances - has to do with the way search works.
If you have both a high-volume/high-noise data source and a low-volume data source feeding into the same index, and you search mostly for events from the low-volume data source, the search speed will be slower than necessary, because the indexer also has to search through all the data from the high-volume source. To mitigate this, you can create dedicated indexes for each data source and send data from each source to its dedicated index. Then, you can specify which index to search on. You'll probably notice an increase in search speed.
Provided your network and indexers have the bandwidth and processing power to handle the throughput, it basically all comes down to searching.
1) It takes longer to search one large index, but you don't have to specify a particular index in the search.
2) If you use separate indexes for different sourcetypes, then you have to specify them in each search, but the searches are faster.
I prefer option 2 because it allows the flexibility to search everything, or specific indexes/sourcetypes as required. However, it does add the user overhead of specifying particular indexes in some searches.
Will search all of my custom indexes.
index=mycustomindex_sourcetype1 OR index=mycustomindex_sourcetype2 OR index=mycustomindex_sourcetype10
Will search only indexes for sourcetypes 1, 2 and 10.