All event-processing Queues are filled up suddenly leading to complete indexer stop over.
All Connectivity is fine. Adding some suspects here:
Log4j source type is my assumption of suspect.
I have this in props.conf for log4j source type.
[log4j] EXTRACT-spectitle = (?i)specificationtitle\s+(?P<spectitle>\S+) EXTRACT-specvendor = (?i)specificationvendor\s+(?P<specvendor>\S+) EXTRACT-impltitle = (?i)implementationtitle\s+(?P<impltitle>\S+) EXTRACT-implversion = (?i)implementationversion\s+(?P<implversion>\S+) EXTRACT-implvendor = (?i)implementationvendor\s+(?P<implvendor>\S+) MAX_TIMESTAMP_LOOKAHEAD = 100
Can someone help me out or suggest to get rid off this?
You need a detailed
Health Check from a qualified Splunk PS shop (we do provide this service) because there could be many, MANY causes (usually more than one). Increasing queue sizes is DEFINITELY NOT the answer and is also a pointless interim solution which can cause other problems. Usually the problem is poor disk performance on the Indexers. Occasionally (especially if it was a DIY operation) the initial configuration of the Search Head and Indexers was never adjusted properly.
Limited inodes and
THP enabled are 2 other common causes that can be checked in the
Health Checks on the
Monitoring Console. Parallel pipelines always helps (if you are only using 1), but it almost never the full answer. It is not the kind of thing that can be done properly here in answers.
Your indexers can't keep up with the workload.
Assuming the CPU's on your indexers are not maxed out during the peak times you showed in the screenshots, my first suggestion would be to increase the parallel ingestion pipelines on your forwarders and indexers.