I have a Splunk cluster consisting of a Master , 2 search-heads and 2 indexers. The indexers receive logs from forwarders as well as through AWS plugin. How do I achieve 0 (near zero) downtime during upgrade of this cluster and ensure no data-loss ?
You cant avoid restarting stand alone search heads. So you will need to prep for some disruption there. (Unavoidable without a SH cluster) - but it's quick. 5 minutes or so. Do it during a quiet period. Also make sure you upgrade SHs first!
You don't mention what version you are running, but if its later than 7.1 then your indexer cluster can be upgraded with minimal disruption to search/indexing operations if you follow the guidelines here:
https://docs.splunk.com/Documentation/Splunk/8.0.2/Indexer/Searchablerollingupgrade
You also don't mention what your datasources are: If its file monitors or windows events, then there is minimal risk to data loss during the upgrade as pending logs will just wait until the indexers are available, and then send any data which was paused.
If you are using syslog, it depends if you are sending data directly to splunk, or via a syslog server+UF. The former will likely cause gaps in your logs, the latter should not.