We have a remote location with a small bandwidth connection. We'd like to have an on-site indexer for all the machines on-site to forward their logs to and have that indexer send the logs to the main indexer cluster. We don't want the stand-alone indexer to be part of the cluser to prevent log data flowing back to it over the narrow pipe during replication. Is this possible? Should we use a light/heavy forwarder instead to forward collected logs and send them on to the cluster? The main concern here is bandwidth utilzation, and the best way to consolodate/compress the data before it hits the wire.
Short answer: yes. But technically what you're describing would be called a heavy forwarder, not an indexer. Just set it up to forward to the indexers, set it to either keep a copy of indexed data or not depending on your preferences and use case (store and forward), and point all your local forwarders at it.
The problem here is that Splunk isn't going to compress the data until it writes it to disk, and so you'll be sending uncompressed data along with extra metadata from the extra queues the heavy forwarder will send the data through.
Unless you want that server there as a type of gatekeeper or fallback in case of network problems, it may be easier to just point all of your forwarders at your main cluster.
There is another issue you should be aware of. If the data is brought into the local indexer, ie heavy forwarder, and then forwarded to another indexer, the data will be counted twice on your license manager.
So 5GB of logs forwarded to the off site indexer will count as 10GB of data. The indexer being forwarded to can't determine that it came from an indexer, or heavy forwarder, so counts it again.