Monitoring Splunk

Splunk stops randomly

Engager

I have been having issues with my splunk where the splunk service stops randomly. here are some logs from splunkd.log right before it went down.
Mostly uses Splunk with Carbon Black add-on to generate reports

02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.084 +0800 ERROR STMgr - dir='C:\Program Files\Splunk\var\lib\splunk\_internaldb\db\hot_v1_70' out of memory failure rc=1 warm_rc[-2,8] from st_txn_start
02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.084 +0800 ERROR STMgr - dir='C:\Program Files\Splunk\var\lib\splunk\_internaldb\db\hot_v1_70' out of memory failure rc=1 warm_rc[-2,8] from st_txn_start
02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.084 +0800 ERROR STMgr - dir='C:\Program Files\Splunk\var\lib\splunk\_internaldb\db\hot_v1_70' out of memory failure rc=1 warm_rc[-2,8] from st_txn_start
02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.084 +0800 ERROR STMgr - dir='C:\Program Files\Splunk\var\lib\splunk\_internaldb\db\hot_v1_70' out of memory failure rc=1 warm_rc[-2,8] from st_txn_start
02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.084 +0800 ERROR STMgr - dir='C:\Program Files\Splunk\var\lib\splunk\_internaldb\db\hot_v1_70' out of memory failure rc=1 warm_rc[-2,8] from st_txn_start
02-22-2019 18:26:55.084 +0800 ERROR StreamGroup - unexpected rc=1 from IndexableValue->index
02-22-2019 18:26:55.615 +0800 ERROR TailReader - Ignoring path="C:\Program Files\Splunk\var\log\splunk\splunkd.log" due to: bad allocation
02-22-2019 18:26:55.646 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:26:56.787 +0800 ERROR IntrospectionGenerator:resource_usage -  KVStorageProvider - Internal read failed with error code '13053' and message 'No suitable servers found: `serverSelectionTimeoutMS` expired: [socket timeout calling ismaster on '127.0.0.1:8191']'
02-22-2019 18:26:57.662 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:26:59.474 +0800 WARN  IntrospectionGenerator:resource_usage -   RU - Failure shapshoting all processes, skipping this collection cycle. Status code is 1455
02-22-2019 18:26:59.677 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:01.693 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:02.412 +0800 WARN  SearchResultsFiles - Error while reading C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD59d4672721e98f163_at_1550830500_46\metadata.csv: Insufficient system resources exist to complete the requested service.
02-22-2019 18:27:02.412 +0800 WARN  DispatchSearchMetadata - could not read metadata file: C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD59d4672721e98f163_at_1550830500_46\metadata.csv
02-22-2019 18:27:02.427 +0800 WARN  SearchResultsFiles - Error while reading C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550681280_630\metadata.csv: Insufficient system resources exist to complete the requested service.
02-22-2019 18:27:02.427 +0800 WARN  DispatchSearchMetadata - could not read metadata file: C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550681280_630\metadata.csv
02-22-2019 18:27:02.662 +0800 WARN  SearchResultsFiles - Error while reading C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550767680_1113\metadata.csv: Insufficient system resources exist to complete the requested service.
02-22-2019 18:27:02.662 +0800 WARN  DispatchSearchMetadata - could not read metadata file: C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550767680_1113\metadata.csv
02-22-2019 18:27:02.677 +0800 WARN  SearchResultsFiles - Error while reading C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550811142_4\metadata.csv: Insufficient system resources exist to complete the requested service.
02-22-2019 18:27:02.677 +0800 WARN  DispatchSearchMetadata - could not read metadata file: C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550811142_4\metadata.csv
02-22-2019 18:27:02.677 +0800 WARN  SearchResultsFiles - Error while reading C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550824415_3\metadata.csv: Insufficient system resources exist to complete the requested service.
02-22-2019 18:27:02.677 +0800 WARN  DispatchSearchMetadata - could not read metadata file: C:\Program Files\Splunk\var\run\splunk\dispatch\scheduler__nobody__sos__RMD5b76b5354b306efbb_at_1550824415_3\metadata.csv
02-22-2019 18:27:03.709 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:05.724 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:06.302 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:06.302 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:06.302 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:06.302 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:06.662 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:06.662 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:06.662 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:06.662 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:07.130 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:07.130 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:07.130 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:07.130 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:07.740 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:08.630 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:08.630 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:08.630 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:08.630 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:08.990 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:08.990 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:08.990 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:08.990 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:09.349 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:09.349 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-MonitorNoHandle.exe"": The operation completed successfully.
02-22-2019 18:27:09.349 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:09.349 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-MonitorNoHandle.exe"": The operation completed successfully.
02-22-2019 18:27:09.380 +0800 WARN  IntrospectionGenerator:resource_usage -   RU - Failure shapshoting all processes, skipping this collection cycle. Status code is 1455
02-22-2019 18:27:09.708 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:09.708 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:09.708 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:09.708 +0800 ERROR ExecProcessor - Couldn't create output pipe for command "python "C:\Program Files\Splunk\etc\apps\TA-Cb_Defense\bin\carbonblack_defense.py"": The operation completed successfully.
02-22-2019 18:27:09.755 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:11.771 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:13.787 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:15.802 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:17.818 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:19.443 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:19.443 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-admon.exe"": The operation completed successfully.
02-22-2019 18:27:19.443 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:19.443 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-admon.exe"": The operation completed successfully.
02-22-2019 18:27:19.802 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:19.802 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-netmon.exe"": The operation completed successfully.
02-22-2019 18:27:19.802 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:19.802 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-netmon.exe"": The operation completed successfully.
02-22-2019 18:27:19.833 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:20.161 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.161 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-powershell.exe"": The operation completed successfully.
02-22-2019 18:27:20.161 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.161 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-powershell.exe"": The operation completed successfully.
02-22-2019 18:27:20.521 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.521 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-regmon.exe"": The operation completed successfully.
02-22-2019 18:27:20.521 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.521 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-regmon.exe"": The operation completed successfully.
02-22-2019 18:27:20.880 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.880 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-winevtlog.exe"": The operation completed successfully.
02-22-2019 18:27:20.880 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:20.880 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-winevtlog.exe"": The operation completed successfully.
02-22-2019 18:27:21.239 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:21.239 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-winprintmon.exe"": The operation completed successfully.
02-22-2019 18:27:21.239 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:21.239 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-winprintmon.exe"": The operation completed successfully.
02-22-2019 18:27:21.599 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:21.599 +0800 WARN  ExecProcessor - Couldn't create stderr pipe for command ""C:\Program Files\Splunk\bin\splunk-powershell.exe" --ps2": The operation completed successfully.
02-22-2019 18:27:21.599 +0800 WARN  Thread - ExecProcessor: about to throw a ThreadException: _beginthreadex: The paging file is too small for this operation to complete.; 78 threads active
02-22-2019 18:27:21.599 +0800 ERROR ExecProcessor - Couldn't create output pipe for command ""C:\Program Files\Splunk\bin\splunk-powershell.exe" --ps2": The operation completed successfully.
02-22-2019 18:27:21.849 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:24.208 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation
02-22-2019 18:27:26.224 +0800 WARN  JournalSlice - Exception while compressing slice: bad allocation

the above is the last row from the logs, do let me know if other logs is required for better understanding.

0 Karma

Esteemed Legend

Assuming that this is linux, check your OS logs for oomkiller at work:

https://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

0 Karma

Engager

This server is on windows server 2012, currently still monitoring after adding more RAM

0 Karma

Splunk Employee
Splunk Employee

If splunk stops, check for crash logs at splunk_home/var/log/splunk/crash.log* to verify if splunk is crashing due to internal issues with splunk. If you don't find the crash logs you need to check system logs to check if kernel killed the splunkd service due to out of memory/booting of file system and so on.

If the kernel killed the splunkd service due to memory, then you need to check if splunkd as a whole is consuming memory or any particular search is using whole memory. However, you need make sure whether you have memory for splunk to run, at least basic recommendation of splunk. This varies if you have more apps to run/more inputs.

0 Karma

SplunkTrust
SplunkTrust

At least half the entries in that log snippet shout "you're out of memory".