Solved: Clone events to 2 indexers

emallinger · ‎12-03-2021

Hello everyone,

Here's the situation :

indexer1, deployment server role

indexer 2

fowarder 1.

I distributed via the deployment server a new outputs.conf with :

[tcpout]:
defaultGroup = indexer1,indexer2

[tcpout:indexer1]
server = xx.xx.xx.xx:9997

[tcpout:indexer2]
server = indexer2.com:9997

There is a VS between forwarder1 and indexer2.

I activated the DEBUG in log.cfg for TcpOutPutProc

The log on forwarder1 tells me only :

12-03-2021 15:08:15.743 +0100 DEBUG TcpOutputProc - channel not registered yet
12-03-2021 15:08:15.743 +0100 DEBUG TcpOutputProc - Connection not available. Waiting for connection ...

and

12-03-2021 15:28:27.862 +0100 WARN TcpOutputProc - Cooked connection to ip=ip_vs_indexer2:9997 timed out

A tcptraceroute tells [open] between the forwarder and the VS but doesn't show me any more than that.

Does this mean I have some network issue ?

Do you have any suggestion ?

Thanks

Ema

emallinger · ‎12-07-2021

It was a "network issue" :

Seems that adding a route with

ip route add

did the trick !

Now I just have to understand why the first connexion is rejected by :

Message rejected. Received unexpected message of size=218824692 bytes from src=forwarder:45814 in streaming mode. Maximum message size allowed=67108864. (::) Possible invalid source sending data to splunktcp port or valid source sending unsupported payload

queue is 5Mb, which is usually sufficent.

Must have something else wrong. But we've made progress !

Thanks !

Regards,

Ema

View solution in original post

emallinger · ‎12-07-2021

It was a "network issue" :

Seems that adding a route with

ip route add

did the trick !

Now I just have to understand why the first connexion is rejected by :

Message rejected. Received unexpected message of size=218824692 bytes from src=forwarder:45814 in streaming mode. Maximum message size allowed=67108864. (::) Possible invalid source sending data to splunktcp port or valid source sending unsupported payload

queue is 5Mb, which is usually sufficent.

Must have something else wrong. But we've made progress !

Thanks !

Regards,

Ema

emallinger · ‎12-07-2021

Not really sure of that.

Between firewall and indexer2 :

From firewall to indexer2 :

R = reset (or so I suppose)

then S = Syn

and just following from indexer2 to firewall

S = Syn

No Ack whatsoever.

I tried another thing : from the forwarder

telnel to firewall : 9997 > connected to firewall:9997

then inside that connexion, I tried telnet indexer2 9997

=> "connexion closed by foreign host"

What does this mean ? If the indexer2 is answering that, is there a log I can check for more details ? (didn't find any help in any of /var/log/ ....log)

As there is no nftable, I can't seem to identify who is answering that, or why...

PickleRick · ‎12-07-2021

Something's not right here.

The normal 3-way hand-shake should look like this in your case

Firewall -> Indexer SYN

Firewall <- Indexer SYN/ACK

Firewall -> Indexer ACK

If you're getting RST at the very beginning, even before any SYNs, something is definitely broken since no RST's should be sent on its own, not as a response to earlier packet.

So I suppose you must have missed something.

If your sequence looks like this:

Firewall -> Indexer SYN

Firewall <- Indexer RST

That means a port on Indexer is not open for listening on the IP you're trying to connect to (or filtered with a firewall rule which sends RST instead of simply dropping the packet).

emallinger · ‎12-07-2021

Yep, unfortunately, that's not very helpful.

I see some data from the supposedly firewall out ip to my indexer2.

Alternatively it [R] then [S], then my indexer2 sends his [S].

=> so it should be ok.

I've changed SSLCommon from log.cfg from INFO to DEBUG.

no help...

ANy clue ?

Thanks,

Ema

PickleRick · ‎12-07-2021

You see a full three-way handshake from your firewall's IP? SYN-SYN/ACK-ACK?

Or are you geting RST? (I'm not sure what [R] and [S] mean in your case).

emallinger · ‎12-07-2021

Hi again,

Ah sorry, VS is our short for "virtual server" (abusive naming probably) which usually is a virtual entrypoint with an IP and 1 or more pool of destinations.

I confirm my 2 indexers are completly different installations (not even the same version of splunk : 8.1.3 and 8.2.3)

If I had 2 targets in my pool, the VS would behave as a load balancer. As I only have one, it's usually just called "round-robin" routing.

I don't know how it is supposed to be named. It's masking a firewall most of the time.

I'm able to grep some data with tcpdump between the "other side of the VS" (I need to use output firewall ip, I was told) and the indexer2.

So maybe a TLS issue is more probable. I did change the log level for TLS in splunk indexer2 for investigation.

Without success I'm afraid. I'm at a loss as to which parameter or option I'm supposed to check or modify...

I'll be greateful for any suggestion.

Thanks,

Ema

PickleRick · ‎12-07-2021

The most basic and obvious test would be to run tcpdump/wireshark on the indexer2 while trying to connect to it from the forwarder. This way you'll see what's happening on the network level.

emallinger · ‎12-07-2021

Hello @gcusello & @PickleRick ,

Thank you for helping me in that matter.

I'm afraid @PickleRick got it right : I'd like to duplicate the events, each event being indexed once on each indexer. The target is NOT to load balance them between the 2 indexers.

telnet says "connected to indexer2.dns", that dns pointing to the VS in front of indexer2. Supposedly receiving on 9997 and "forwarding to indexer2:9997"

tcptraceroute to indexer2.dns stops to VS_ip as well.

There is a firewall between the forwarder and indexer2. Is it possible the flux is well received yet not transmitted ?

Thanks,

Ema

PickleRick · ‎12-07-2021

We don't know the rest of your config and infrastructure. Possible causes include:

- lack of communication between the VS (whatever that is XD) and the actual indexer

- some TLS-related miscommunication (so that the basic TCP connection succeeds but the TLS negotiation fails); but I'm not sure if it wouldn't manifest itself with a different error message

BTW, are your indexers two completely separate installations? Have you considered clustering and replication on indexer level? That way you wouldn't need to send data twice, your indexers would have consistent state with each other and your license would only count once.

gcusello · ‎12-07-2021

Hi @emallinger,

sorry I misunderstood!

yes, check the firewall: if telnet cannot reach indexer2 there's something that stops.

Ciao.

Giuseppe

emallinger · ‎12-03-2021

Hello @gcusello ,

[Settings -- Forwarding and Receiving -- Receiving] => Yes, 9997 port opened (confirmed with ss -lt)

I restarted all the machines multiples times as I was not sure reload was enough for a new forwarding flux.

New configuration was well received.

It's test environment so no need to worry.

Thanks,

Ema

gcusello · ‎12-03-2021

Hi @emallinger,

Did you checked the connection between servers using telnet on ports 8089 and 9997?

Then I found a little difference between your outputs.conf and the ones I usually use:

[tcpout]
defaultGroup = default-autolb-group

[tcpout-server://indexer1:9997]
[tcpout-server://indexer2:9997]

[tcpout:default-autolb-group]
server = indexer1:9997,indexer2:9997
disabled=false

Please check it.

Giuseppe

PickleRick · ‎12-04-2021

@gcuselloYour config does loadbalancing between two indexers - events go to either of the indexing servers at a time. If I understand correctly, the OP wants to have each event distributed to both indexers simultaneously.

gcusello · ‎12-03-2021

Hi @emallinger,

some quick questions to debug your situation:

at first, did you enabled receiving on indexers? [Settings -- Forwarding and Receiving -- Receiving]

Then did you configured the Splunk restart after updates via Deployment Server? by defaut restart isn't configured.

then did you tested connection using telnet on ports 9997 and 8089?

Did Your Forwarder received the new configurations?

Only for your information, for a test it's acceptable but in production it isn't a best practice to use one Indexer to as Deployment Server.

Ciao.

Giuseppe

Clone events to 2 indexers

indexer

universal forwarder

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)