Hi Splunkers, today I have a strange situation that require a well done data sharing by my side, so please forgive me if I'm goint to be long.
We are managing a Splunk Enterprise Infrastructure previously managed by another company.
We are in charge of AS IS management and, at the same time, perform the migration to new environment.
The Splunk new env setup is done, so now we need to migrate data flow.
Following Splunk best pratice, we need to temporarily perform a double data flow:
We already faced, on another customer, a double data flow, managed using Route and filter data doc and support here on community. So the point is not: we don't know how it works. The issue is: something is not going as expected.
So, how the current env is configured? Below, key elements:
So, how cloud HF is configured for tcp data routing?
In $SPLUNK_HOME$/etc/system/local/inputs.conf, two stanza are configured to receive data on ports 9997 and 9998; configuration is more or less:
[<log sent on HF port 9997>]
_TCP_ROUTING = Indexer1_group
[<log sent on HF port 9998>]
_TCP_ROUTING = Indexer2_group
Then, in $SPLUNK_HOME$/etc/system/local/outputs.conf we have:
[tcpout]
defaultGroup=Indexer1_group
[tcpout:Indexer1_group]
disabled=false
server=Indexer1:9997
[tcpout:Indexer2_group]
disabled=false
server=Indexer2:9997
So, the current behavior is:
At this point, we need to insert new environment hosts; in particular, we need to link a new HFs set. At this phase, as already shared, we need to send data to old env and to new one. We can discuss about avoid to insert another HFs set, but there are some reason about using it and the architecture has been approved by Splunk itself. So, what we have to perform now is:
So, how we tried to perform this?
Below there is our changed configuration.
inputs.conf:
[<log sent on HF port 9997>]
_TCP_ROUTING = Indexer1_group, newHFs_group
[<log sent on HF port 9998>]
_TCP_ROUTING = Indexer2_group, newHFs_group
outputs.conf:
[tcpout]
defaultGroup=Indexer1_group, newHFs_group
[tcpout:Indexer1_group]
disabled=false
server=Indexer1:9997
[tcpout:Indexer2_group]
disabled=false
server=Indexer2:9997
[tcpout:newHFs_group]
disabled=false
server=HF1:9997, HF2:9997, HF3:9997
In a nutshell, we tried to achieve:
So, what went wrong?
In particular, we should see the following behavior:
All logs not collected on port 9997 and 9998, like network data input, are equally sent to Indexer1 and new HFs: a copy to Indexer1 and a copy to new HFs. So, if we have outputs of N logs, we must have 2N logs sent: N to Indexer1 and N to new HFs
What we are seeing is:
All logs not collected on port 9997 and 9998, like network data input, are auto load balanced and splitted between Indexer1 and new HFs. So, if we have outputs of N logs, we see N sent: we have more or less 80% sent to Indexer1 and remaining 20% to new HFs.
I underlined many times that some kind of logs not collected on port 9997 and 9998 are the Network ones, because we are seeing that auto lb and log splitting is happening most of all with them.
That is indeed strange because your setup - according to the specs for both files (inputs and outputs.conf) should work as expected indeed.
I supposed you checked with btool what is the effective config regarding your inputs and outputs? (especially that nothing overwrites your _TCP_ROUTING)
One thing I'd try would be to add the _TCP_ROUTING entries to the [default] stanza and [WinEventLog] (if applicable; I suppose in your case it's not).
That is indeed strange because your setup - according to the specs for both files (inputs and outputs.conf) should work as expected indeed.
I supposed you checked with btool what is the effective config regarding your inputs and outputs? (especially that nothing overwrites your _TCP_ROUTING)
One thing I'd try would be to add the _TCP_ROUTING entries to the [default] stanza and [WinEventLog] (if applicable; I suppose in your case it's not).
You were right. it figured out that there was additional _TCP_ROUTING settings in come custom inputs.conf local. Fixing this, it worked.