Greetings everyone. We have a moderately sized distributed deployment. We have 3 search heads pooled, and all 3 have been added to a fqdn that round-robins through the IPs. We are in the process of upgrading from splunk v4 to splunk v5. On v5, when logging in using the DNSRR fqdn, almost immediately after successfully logging in the user is immediately logged back out. This was not the case in v4.
Has anyone else encountered this sort of issue? I'm not sure how to even approach it.
You can't load-balance Splunk with DNS round robin. You must use a mechanism that preserves session affinity, or you will be forced to log in again whenever your browser is directed to a new server. I am surprised that it works in the older version, but my speculation is that your DNS configuration is slightly different, e.g., the TTL for the IPs may have been set much longer.
Really? I'd have thought that the local dns cache would have stored the IP of the last server used. Darn - sounds like I need to start begging for a load balancer. Is it known whether or not GSLB will work? Anything to avoid exhausting a tight budget, lol.
the TTL specifies how long the client should cache a DNS entry. so if your old one was set to 86400 seconds, while your new is 30 seconds, you'll have problems.
GSLB based on DNS usually sets the TTL very low, specifically so that the DNS server can then change the host without the IP getting stuck too long in a client cache. On the other hand, it should not do round robin to the same client, in order for it to work correctly, it needs to keep sending the same address back to any given client, until a failure occurs and it has to tell the client to use a new target IP.
Excellent, I will give GSLB a shot. Luckily, I may have located some cheap load balancers as well, so I can tackle it in our next sprint. Thanks so much for your help!