Indexer search errors

zijian · ‎11-01-2023

Hi,

One of our three clustered indexers is having search errors and high CPU fluctuations for splunkd main process after an improper reboot as follows:

In splunk web search:

remote search process failed on peer
Search results might be incomplete: the search process on the peer:[Affected indexer] ended prematurely. Check the peer log, such as $SPLUNK_HOME/var/log/splunk/splunkd.log and as well as the search.log for the particular search.
[Affected indexer] Search process did not exit cleanly, exit_code=111, description="exited with error: Application does not exist: Splunk_SA_CIM".
Please look in search.log for this peer in the Job Inspector for more info.

In splunkd.log of affected indexer:

WARN SearchProcessRunner [31756 PreforkedSearchesManager-0] - preforked process=0/101006 with search=0/127584 exited with code=111

ERROR SearchProcessRunner [31756 PreforkedSearchesManager-0] - preforked search=0/127584 on process=0/101006 caught exception: used=1, bundle=7471316304185390773, workload_pool=, generation=11, age=7.418, runtime=7.203, search_started_ago=7.204, search_ended_ago=0.000

ERROR SearchProcessRunner [31756 PreforkedSearchesManager-0] - preforked process=0/101006 with search=0/127584 and
cmd=splunkd\x00search\x00--id=remote_SH-ES_scheduler__splunkadmin__SplunkEnterpriseSecuritySuite__RMD5852d4ed30e6a890b_at_1698892200_90939\
x00--maxbuckets=0\x00--ttl=60\x00--maxout=0\x00--maxtime=0\x00--lookups=1\x00--streaming\x00--sidtype=normal\x00--outCsv=true\x00--acceptSrsLevel=1\

died on exception (exit_code=111):
Application does not exist: SplunkEnterpriseSecuritySuite

WARN PeriodicReapingTimeout [30157 DispatchReaper] - Spent 10650ms reaping search artifacts in /splunk/var/run/splunk/dispatch

WARN DispatchReaper [30157 DispatchReaper] - The number of search artifacts in the dispatch directory is higher than recommended (count=6608, warning threshold=5000) and could have an impact on search performance. Remove excess search artifacts using the "splunk clean-dispatch" CLI command, and review artifact retention policies in limits.conf and savedsearches.conf. You can also raise this warning threshold in limits.conf / dispatch_dir_warning_size.

WARN DispatchManager [13827 TcpChannelThread] - quota enforcement for user=splunk_user1, sid=soc_user1_c29jX2Njb191c2VyMQ__SplunkEnterpriseSecuritySuite__RMD57f02abc0263583b0_1697962710.21728, elapsed_ms=23865, cache_size=1591 took longer than 15 seconds. Poor search start performance will be observed. Consider removing some old search job artifacts.

Regards,

Zijian

Pranav_Support

Can anyone explain if the following issues could be interconnected?

Storage Limit: Splunk’s storage is nearing its limit. Could this be affecting the performance or functionality of other components?
Permission Error: An error message indicates that the “Splunk_SA_CIM” app either does not exist or lacks sufficient permissions. Could this be causing issues with data access or processing?
Transparent Huge Pages (THP) Status: THP is not disabled. It’s known that THP can interfere with Splunk’s memory management. Could this be contributing to the problems?
Memory and Ulimit: Could memory constraints or ulimit settings be causing errors?
Remote Search Process Failure: There was a failure in the remote search process on a peer, leading to potentially incomplete search results. The search process on the peer (Affected indexer) ended prematurely. The error message suggests that the application “Splunk_SA_CIM” does not exist. Could this be related to the aforementioned “Splunk_SA_CIM” error?

Could these issues be interconnected, and if so, how? Could resolving one issue potentially alleviate the others?

PickleRick

The thread you're responding to is relatively old and is not directly related to your question.

To keep the Answers tidy and focused and to ensure visibility of your issue please submit your question(s) as a new thread.

gcusello · ‎11-02-2023

Hi @zijian,

I see two or three possible issues:

you don't have enough disk space for dispatched artifacts, how much space have in the splunk file system?
probably your storage isn't so performant: Splunk requires at least 800 IOPS (better 1200), check your storage performance using a tool as Bonnie++,
have you sufficient resources (CPUs)? Splunk requires at least 12 CPUS and more than 16 if yu have Premium Apps like ES or ITSI.

Anyway, after these checks open a case to Splunk Support, because they can, using a diag, analyze your system and give your a detailed answer.

Ciao.

Giuseppe

isoutamo · ‎11-02-2023

Hi

as @gcusello said you must check your disk spaces on those nodes (both sh + idx sides). Look especially what you have on /opt/splunk/var.

Also you should clean up your dispatch directory as logs said and/or extend value in limits.conf.

Have you one (or some) users which are mainly running those queries & DMAs? If so check that this user have enough quota define on his/her role.

You should use MC for looking what are happening on your system. If you haven't set up it yet, then it's time to set it on (probably on some other individual node).

r. Ismo

Indexer search errors

search job inspector

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

.conf24 | Personalize your .conf experience with Learning Paths!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...