We are planning to have indexer cluster environment.
For testing, we currently have single indexer which has all of our application like "Splunk Addon for AWS" and others installed and all data is arriving as expected.
I wanted to know, if we go for cluster environment where we have master node and multiple indexer node, in such case, do I need to install all the apps in all the indexers servers ? For example, do I need to install "Splunk Addon for AWS" app in all the indexers ?
Yes, you should have all the same apps on all the cluster peers.
For the Splunk Addon for AWS, I would get a separate heavyweight forwarder to send the data to the indexer cluster. This will also cause less strain on the indexer cluster.
In response to your other question, if you put the same app on all five servers and configure it for all five servers, you will duplicate your data 5x. This is why you should use a separate heavy forwarder to send data to the cluster, as you only have one set of data being pulled into your system. This introduces a single point of failure, but you can copy over the configuration to another heavy forwarder and disable the inputs.
assuming the app is installed on the HF to collect data and send it to the indexer cluster, and the app is also installed on all the indexers for setting up stuff like props / transforms / indexes (specifically at least the indexes.conf).
Or is it better to install the AWS app on a HF and manually configure awsindex through something like etc/system/local/indexes? Would imagine "install aws app on all nodes" is a better approach?
Simply put: Yes. All cluster members must have the same configurations that are being deployed through the cluster-master. Please refer to the Cluster-Management Class and the documentation.
In regards to other comments about duplicating data if an app resides on multiple indexers: The only configuration item that influences the redundancy of data is the replication factor that has been defined for the index. Data will only be indexed twice, if some app contains inputs.conf settings that lead to ingesting data in duplicates. Hence the best practice for distributed environments is to keep all inputs.conf separated from the original app and deploy them through the deployment server to forwarders. As a general rule of thumb: only forwarders should have any inputs.conf files in their ./etc tree... unless the special use case calls for an exception. Alternatively the inputs.conf stanzas can be disabled everywhere other than inside the ./etc/deploment-apps/ subdir on the deployment server.