Hello,
I have recently inherited a Splunk Enterprise (v6.6) instance with some serious issues. The architecture is a distributed one with the Search head, Indexer and Heavy Forwarder all residing on different hosts. The primary problem I am facing is that after a short period of the time the queues (parsing, aggregator, typing and index) reach 100% and result in the error mentioned in the title.
Upon investigating the Index file directory where the errors are reported, there are 100+ .lock files that seem to replicate as file.lock, file.lock.lock, file.lock.lock.lock etc etc.
The machines that are running Splunk have more than enough RAM,CPU and IOPS. I have manually run splunk-optimize with no effect. I am lost on what to do next and almost considering deleting the index (not preferred) to resolve this issue.
Any help would be much appreciated.
where are the queues pile up? indexer or heavy forwarder?
Starts at the Heavy Forwarder and once it is maxed out, it then flows onto the indexer until it is 100% m