Setup We have a cluster of compute nodes, call them node01-node05. They all will run jobs that create data we'd like to put into Splunk. Jobs are farmed out based on available resources. Splunk indexers are also co-exist on node01-node05. We could index the data from the resulting job with the local Splunk installation. The problem is we'll see severe data skew because node01-node04 may be busy for a long while, thus farming all jobs that will feed Splunk to a single node. So the solution would be to redistribute the data with a forwarder (which, while not perfectly load balancing, is better). We could use a dedicated forwarder to re-distribute, but we'd then be paying the penalty of using the network twice - once to the forwarder and once back to the cluster. So I made each indexer also forward data.
[batch://path/to/files] move_policy = sinkhole index = my_index sourcetype = my_sourcetype crcSalt = <SOURCE> _TCP_ROUTING = rest_of_the_splunk_cluster [splunktcp:9997]
[tcpout] heartbeatFrequency = 15 maqQueueSize = 10000 [tcpout:rest_of_the_splunk_cluster] server = node01:9997, node02:9997, node03:9997, node04:9997, node05:9997 autoLB = true autoLBFrequency = 5
Question Ok, so now that we've done that. This works. If we're on node01 the data will be distributed to node02-node05. But what is the impact, if any, of having node01 listed as a server in its own outputs.conf? Because it won't send the data to itself, but does it try and fail? Timeout?
The simple answer is to not include this entry in the outputs.conf. My issue comes in because instead of 5 nodes I have a gaggle. To further complicate matters, all of our software deployments/upgrades rely on puppet which doesn't make this sort of thing any easier.
This can now be accomplished by installing the Splunk universal forwarder along side Splunk itself. The only change to the forwarder that is required is the splunkd port it binds to. To do this add a web.conf to $SPLUNK_FORWARDER_HOME/ect/system/local/ that says
[settings] mgmtHostPort = <SERVERIP:newPort>
Boom. Boots upside yo head.
You could do this with a non-universal forwarder (i.e., a standard Splunk Light Forwarder) if you're not on 4.2 by just installing the second instance in a second location. Disadvantage is that you can't do this on Windows machines. Also you need to muck with and copy/rename the /etc/init.d/splunk and associated run-level symlinks to remove conflicts between the two instances.