I'm currently ingesting a data from db connect. While ingesting I tried to do a search in a search head led by ELB but then an error came out. It seems it encountered a problem with one of my peers. I accidentally refreshed so I didn't manage to capture the error message.
I checked my Cluster Master and indeed, one of the peers is down. I can ping the instance but I can't access it by ssh. We already encountered this situation just the other day and AWS sent us the Cloudwatch Log of the instance. It reported that it caused a memory spike..
Are there any recommendations on what to do?
To be honest, this does not sound like a Splunk question. You should probably head over to the AWS forums and ask your question there, and consider opening a case with AWS support.
However, I must admit I am a bit confused by your description.
You mention you are using an ELB - I presume because you are running a Search Head Cluster?
So I have to ask if this is an Indexer Peer which is failing, or a SHC member?
I'm running a clustered environment sir Nick and it is one of the indexer peers failing. It started working again after restarting the instance but I'm worrying that it might happen again.
Thanks for the response!
I would start by looking at the logs you can get from the AWS console - When a machine 'crashes' often this log can give you an insight into anything it spat out on the console just before it died.
I'd suggest getting AWS to help you look into it if it happens again - Since your indexers are clustered hopefully you have enough replicated copies to keep your data searchable while they look into it.
Of course, you could be overwhelming the instance - you could consider increasing the instance size?