I'm working on a procedure to move from an old indexer to a new indexer without losing any events. The configuration is pretty simple: there are a number of Universal Forwarders which send to one indexer, and we'd like to replace it with another indexer. Eventually the old indexer will be disabled. Everything's basically vanilla.
Questions
My goal asking this question is to tap into the knowledge base of the splunk community to determine:
Sanity check. Is this approach reasonable? Is there a better approach?
Can I really replicate between indexers by forwarding like this?
Am I correct in thinking the Universal Forwarders won't cause any problems by blocking?
Any further input is much appreciated.
Requirements:
no lost events
don't want to have to change configuration on the live forwarders on the fly, but rather would like the process to be more or less invisible to them, besides a DNS change.
a bit of splunk downtime is acceptable.
Procedure
I thought the best procedure would be something like this:
spin up new server "splunk2". It will already contain the same configuration as the old server.
Shut down old server "splunk1". Universal Forwarders begin to queue and might eventually block. But, since they forward only locally, this won't cause any cascading blocking ( dropEventsOnQueueFull is set to the default of -1 )
Roll hot indexes to warm. I think maybe this will happen automatically but I'm unclear as to whether it's at startup or shutdown.
Copy all indexes from splunk1 to splunk2. Since the hot indexes have already been rolled this would only include warm and below. Now the servers contain the exact same indexes.
Add the new server splunk2 as an Output on the old server splunk1.
Start splunk2 and make sure it comes up with the indexes moved over.
Start splunk1. It should quickly receive all the queued events from the forwarders and forward those events to splunk2 as well as indexing them locally.
Start searching again on splunk2. Do Sanity checks. Up to this point, the procedure can be rolled back by simply removing the forward on splunk1 and discarding splunk2.
Update DNS to point to splunk2. As the change propagates to the forwarders they will cut splunk1 out of the loop and their events will be sent directly to splunk2. However since any old server events will still be forwarded, nothing will be lost. This the trigger pull step, as after this is done there's no easy way to rollback (I suppose the indexes would have to be merged or something)
Wait for the TTL of the DNS record to expire, + dnsResolutionInterval , for the changes to be guaranteed to propagate. After the TTL expires all records on intermediary DNS servers should expire, and dnsResolutionInterval
Thanks for any advice you care to share!
... View more