Hello everyone,
Here's the situation :
indexer1, deployment server role
indexer 2
fowarder 1.
I distributed via the deployment server a new outputs.conf with :
[tcpout]:
defaultGroup = indexer1,indexer2
[tcpout:indexer1]
server = xx.xx.xx.xx:9997
[tcpout:indexer2]
server = indexer2.com:9997
There is a VS between forwarder1 and indexer2.
I activated the DEBUG in log.cfg for TcpOutPutProc
The log on forwarder1 tells me only :
12-03-2021 15:08:15.743 +0100 DEBUG TcpOutputProc - channel not registered yet
12-03-2021 15:08:15.743 +0100 DEBUG TcpOutputProc - Connection not available. Waiting for connection ...
and
12-03-2021 15:28:27.862 +0100 WARN TcpOutputProc - Cooked connection to ip=ip_vs_indexer2:9997 timed out
A tcptraceroute tells [open] between the forwarder and the VS but doesn't show me any more than that.
Does this mean I have some network issue ?
Do you have any suggestion ?
Thanks
Ema
It was a "network issue" :
Seems that adding a route with
ip route add
did the trick !
Now I just have to understand why the first connexion is rejected by :
Message rejected. Received unexpected message of size=218824692 bytes from src=forwarder:45814 in streaming mode. Maximum message size allowed=67108864. (::) Possible invalid source sending data to splunktcp port or valid source sending unsupported payload
queue is 5Mb, which is usually sufficent.
Must have something else wrong. But we've made progress !
Thanks !
Regards,
Ema
It was a "network issue" :
Seems that adding a route with
ip route add
did the trick !
Now I just have to understand why the first connexion is rejected by :
Message rejected. Received unexpected message of size=218824692 bytes from src=forwarder:45814 in streaming mode. Maximum message size allowed=67108864. (::) Possible invalid source sending data to splunktcp port or valid source sending unsupported payload
queue is 5Mb, which is usually sufficent.
Must have something else wrong. But we've made progress !
Thanks !
Regards,
Ema
Not really sure of that.
Between firewall and indexer2 :
From firewall to indexer2 :
R = reset (or so I suppose)
then S = Syn
and just following from indexer2 to firewall
S = Syn
No Ack whatsoever.
I tried another thing : from the forwarder
telnel to firewall : 9997 > connected to firewall:9997
then inside that connexion, I tried telnet indexer2 9997
=> "connexion closed by foreign host"
What does this mean ? If the indexer2 is answering that, is there a log I can check for more details ? (didn't find any help in any of /var/log/ ....log)
As there is no nftable, I can't seem to identify who is answering that, or why...
Something's not right here.
The normal 3-way hand-shake should look like this in your case
Firewall -> Indexer SYN
Firewall <- Indexer SYN/ACK
Firewall -> Indexer ACK
If you're getting RST at the very beginning, even before any SYNs, something is definitely broken since no RST's should be sent on its own, not as a response to earlier packet.
So I suppose you must have missed something.
If your sequence looks like this:
Firewall -> Indexer SYN
Firewall <- Indexer RST
That means a port on Indexer is not open for listening on the IP you're trying to connect to (or filtered with a firewall rule which sends RST instead of simply dropping the packet).
Yep, unfortunately, that's not very helpful.
I see some data from the supposedly firewall out ip to my indexer2.
Alternatively it [R] then [S], then my indexer2 sends his [S].
=> so it should be ok.
I've changed SSLCommon from log.cfg from INFO to DEBUG.
no help...
ANy clue ?
Thanks,
Ema
You see a full three-way handshake from your firewall's IP? SYN-SYN/ACK-ACK?
Or are you geting RST? (I'm not sure what [R] and [S] mean in your case).
Hi again,
Ah sorry, VS is our short for "virtual server" (abusive naming probably) which usually is a virtual entrypoint with an IP and 1 or more pool of destinations.
I confirm my 2 indexers are completly different installations (not even the same version of splunk : 8.1.3 and 8.2.3)
If I had 2 targets in my pool, the VS would behave as a load balancer. As I only have one, it's usually just called "round-robin" routing.
I don't know how it is supposed to be named. It's masking a firewall most of the time.
I'm able to grep some data with tcpdump between the "other side of the VS" (I need to use output firewall ip, I was told) and the indexer2.
So maybe a TLS issue is more probable. I did change the log level for TLS in splunk indexer2 for investigation.
Without success I'm afraid. I'm at a loss as to which parameter or option I'm supposed to check or modify...
I'll be greateful for any suggestion.
Thanks,
Ema
The most basic and obvious test would be to run tcpdump/wireshark on the indexer2 while trying to connect to it from the forwarder. This way you'll see what's happening on the network level.
Hello @gcusello & @PickleRick ,
Thank you for helping me in that matter.
I'm afraid @PickleRick got it right : I'd like to duplicate the events, each event being indexed once on each indexer. The target is NOT to load balance them between the 2 indexers.
telnet says "connected to indexer2.dns", that dns pointing to the VS in front of indexer2. Supposedly receiving on 9997 and "forwarding to indexer2:9997"
tcptraceroute to indexer2.dns stops to VS_ip as well.
There is a firewall between the forwarder and indexer2. Is it possible the flux is well received yet not transmitted ?
Thanks,
Ema
We don't know the rest of your config and infrastructure. Possible causes include:
- lack of communication between the VS (whatever that is XD) and the actual indexer
- some TLS-related miscommunication (so that the basic TCP connection succeeds but the TLS negotiation fails); but I'm not sure if it wouldn't manifest itself with a different error message
BTW, are your indexers two completely separate installations? Have you considered clustering and replication on indexer level? That way you wouldn't need to send data twice, your indexers would have consistent state with each other and your license would only count once.
Hi @emallinger,
sorry I misunderstood!
yes, check the firewall: if telnet cannot reach indexer2 there's something that stops.
Ciao.
Giuseppe
Hello @gcusello ,
[Settings -- Forwarding and Receiving -- Receiving] => Yes, 9997 port opened (confirmed with ss -lt)
I restarted all the machines multiples times as I was not sure reload was enough for a new forwarding flux.
New configuration was well received.
It's test environment so no need to worry.
Thanks,
Ema
Hi @emallinger,
Did you checked the connection between servers using telnet on ports 8089 and 9997?
Then I found a little difference between your outputs.conf and the ones I usually use:
[tcpout]
defaultGroup = default-autolb-group
[tcpout-server://indexer1:9997]
[tcpout-server://indexer2:9997]
[tcpout:default-autolb-group]
server = indexer1:9997,indexer2:9997
disabled=false
Please check it.
Giuseppe
@gcuselloYour config does loadbalancing between two indexers - events go to either of the indexing servers at a time. If I understand correctly, the OP wants to have each event distributed to both indexers simultaneously.
Hi @emallinger,
some quick questions to debug your situation:
at first, did you enabled receiving on indexers? [Settings -- Forwarding and Receiving -- Receiving]
Then did you configured the Splunk restart after updates via Deployment Server? by defaut restart isn't configured.
then did you tested connection using telnet on ports 9997 and 8089?
Did Your Forwarder received the new configurations?
Only for your information, for a test it's acceptable but in production it isn't a best practice to use one Indexer to as Deployment Server.
Ciao.
Giuseppe