Apart from seeing data coming from the forwarders arriving in an index, is there any way I can see which indexer a forwarder is currently sending data to? Either via a command, api call or log entry?
On a forwarder that isn't using indexer discovery, you can do a "splunk list forward-server" you can also see a "Connected to " message in splunkd.log.
With indexer discovery, neither of these show up the same.
We are on the right track!
garethatiag is right, iz broke! the TcpOutputProc is telling us that the connection to the indexers is down, hence the blocking for x amount of seconds.
On the master node, try searching _internal for:
index="_internal" component=CMIndexerDiscovery host=cmaster
You are looking for a message like:
CMIndexerDiscovery - Registering new forwarder <GUID> (total: 1). Heartbeat assigned for next check: 30 seconds
If you see nothing, then you need to check the forwarder's config of the master URI.
If you do see something then move to checking the logs on the forwarder again:
egrep 'ERROR|WARN' splunkd.log
egrep 'HttpPubSubConnection' splunkd.log
Also, make sure you have the pass4SymmKey matching on both the master and forwarder.
Is this a single site or multi-site cluster?
Beyond that, if you paste your master and forwarder config stanza we can proof read for ya 😉
Hi, I just discovered a CLI command you can run to see the active forwarders your UF/HF are talking to:
$SPLUNK_HOME/bin/splunk list forward-server
will list in memory peer nodes the UF/HF is talking to.
On the forwarder, you can see the search peers using tstats quite effectively.
| tstats count where index=_* by splunk_server
We are on the right track!
garethatiag is right, iz broke! the TcpOutputProc is telling us that the connection to the indexers is down, hence the blocking for x amount of seconds.
On the master node, try searching _internal for:
index="_internal" component=CMIndexerDiscovery host=cmaster
You are looking for a message like:
CMIndexerDiscovery - Registering new forwarder <GUID> (total: 1). Heartbeat assigned for next check: 30 seconds
If you see nothing, then you need to check the forwarder's config of the master URI.
If you do see something then move to checking the logs on the forwarder again:
egrep 'ERROR|WARN' splunkd.log
egrep 'HttpPubSubConnection' splunkd.log
Also, make sure you have the pass4SymmKey matching on both the master and forwarder.
Is this a single site or multi-site cluster?
Beyond that, if you paste your master and forwarder config stanza we can proof read for ya 😉
That getting very close to the correct answer!
index=_internal host=clustermaster component=CMIndexerDiscovery
By looking for those messages from the cluster master I was seeing that the forwarder wasn't talking into the cluster master correctly for some reason.
Revisiting the doco config examples I found that I didn't explicitly set a forwarder password. I had the cluster password set under [clustering] but not one under the [indexer_discovery] stanza.
The doco pages makes it seem like it is optional (the " If specified here" part!).
[indexer_discovery]
pass4SymmKey =
* Security key shared between master node and forwarders.
* If specified here, the same value must also be specified on all forwarders
connecting to this master."The pass4SymmKey attribute specifies the security key used with communication between the master and the forwarders. Its value must be the same for all forwarders and the master node. You must explicitly set this value for each forwarder."
I had wrongly assumed that I could use the existing index cluster password that search heads use.
It seems to silently fail with no indication of why.
Once I added the additional password on both the cluster master and forwarder I was able to see it report in with its GUID ( historically visible from prior logs: index="_internal" host=myforwarders GUID source="/opt/splunkforwarder/var/log/splunk/splunkd.log" | stats values(guid) by serverName
).
This however seems to be a one time message.
You can't actively see which indexer a forwarder is currently talking to from those messages.
When it is correctly working however the old "Connected to idx=10.11.11.1:9997" messages return.
Nice work!!
Grepping splunkd.log from the CLI on your forwarder, or searching index=_internal for TcpOutputProc should allow you to audit any TCP activity the forwarder has been up to. It is the same process responsible for the "Connected to" messages you are referring to.
Here is an example from one of my lab forwarders. Although I am not using indexer discovery, I assume the TcpOutputProc should still be responsible for setting up the connections.
splunker@n00b-splkufwd-01:/opt/splunkforwarder/var/log/splunk$ grep TcpOutputProc splunkd.log
10-09-2016 15:26:20.515 +0000 INFO TcpOutputProc - Detected connection to 10.10.10.10:9997 closed
10-09-2016 15:26:20.515 +0000 INFO TcpOutputProc - Will close stream to current indexer 10.10.10.10:9997
10-09-2016 15:26:20.515 +0000 INFO TcpOutputProc - Closing stream for idx=10.10.10.10:9997
10-09-2016 15:26:51.027 +0000 INFO TcpOutputProc - Connected to idx=10.10.10.10:9997
10-10-2016 15:07:49.195 +0000 INFO TcpOutputProc - begin to shut down auto load balanced connection strategy
10-10-2016 15:07:49.311 +0000 INFO TcpOutputProc - Shutting down auto load balanced connection strategy
10-10-2016 15:07:49.311 +0000 INFO TcpOutputProc - Auto load balanced connection strategy shutdown finished
10-10-2016 15:07:49.311 +0000 INFO TcpOutputProc - Received shutdown control key.
10-10-2016 15:07:52.746 +0000 INFO TcpOutputProc - Initializing with fwdtype=lwf
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - found Whitelist forwardedindex.0.whitelist , RE : .*
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - found Blacklist forwardedindex.1.blacklist , RE : _.*
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - found Whitelist forwardedindex.2.whitelist , RE : (_audit|_introspection|_internal)
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.10.10.10:9997
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - tcpout group n00b-splkidx-02 using Auto load balanced forwarding
10-10-2016 15:07:52.753 +0000 INFO TcpOutputProc - Group n00b-splkidx-02 initialized with maxQueueSize=512000 in bytes.
10-10-2016 15:07:52.842 +0000 INFO TcpOutputProc - Connected to idx=10.10.10.10:9997
Let me know if you still can't find the logs and I will set up indexer discovery and test!
I don't see anything like that. The forwarder just stops working as soon as indexer discovery is turned on.
index=_internal host=forwarder-38* source=*splunkd.log TcpOutputProc
10-07-2016 11:41:08.118 +1100 INFO TcpOutputProc - Connected to idx=xxxxx:9997
.............
10-07-2016 13:45:22.105 +1100 WARN TcpOutputProc - Forwarding to indexer group Production blocked for 3700 seconds.
This uf is running v6.4.1.
In the example you have provided the forwarding to the indexer is simply not working.
Do you have SSL enabled on port 9997? Do you have multiple ports open on the indexer?
Indexer discovery has some limitations, I found that multiple ports (I had 9997/9998 as splunk TCP ports) can confuse the indexer discovery...
Also do you have any errors besides the warning of the output failing to go through ?