All Apps and Add-ons

splunkd keeps crashing with uberAgent app

Engager

Splunk version 7.1.2
uberAgent version: 5.0.1

We have Splunk Search Head + Splunk Indexer + Splunk Heavy Forwarder all running on Windows 2012R2.

We have also uberAgent app installed on Search Head and uberAgent_Indexer app installed on Indexer. It looks like uberAgent is crashing Splunk service on the Indexer frequently. This issue seems to be related to uberAgent, because after disabling the app it isn't crashing anymore.

However we would assume that even if the uberAgent app is buggy, it would not crash Splunk completely, because this completely stops anyone from using the Search Head to search anything (even indexes not related to uberAgent).

Something is very odd there - it looks like the Splunk service on the Indexer sometimes recovers itself automatically, because uberAgent crashes the service e.g. 10 times a day without our intervention, so something must be restarting the splunk service there. Unfortunately, it looks like sometimes the service is not restarted and hence any searches from the Search Head stop working. Then we have to restart the Spplunk service on the Indexer and re-add the Splunk Indexer to the Distributed Search servers to make searches work again (otherwise the Indexer's status is shown as "Sick").

As a side effect of the frequent crashes, dump log files are created along the log files. Each log file takes about 2GB of disk space and since these are not maintained and cleared up automatically, they have filled up the disk space causing Splunk crash due to "not enough disk space". It was the disk space issue which lead us to find out what is going on and found out the root cause of "not enough disk space" - uberAgent was crashing Splunk and the dump files were generating 100GB+ of data.

10/08/2018  04:40             4,449 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-04-40-23.log
10/08/2018  05:11             8,507 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-11-39.log
10/08/2018  05:40             8,615 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-40-24.log
10/08/2018  05:50             8,508 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-50-30.log
10/08/2018  05:56             4,078 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-56-30.log
10/08/2018  06:50             4,546 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-25.log
10/08/2018  07:00             2,435 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-28.log
10/08/2018  06:50             8,611 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-34.log
10/08/2018  07:06             4,359 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-07-06-30.log
10/08/2018  07:26             6,603 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-07-25-31.log
10/08/2018  10:10             5,211 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-10-10-37.log
10/08/2018  10:35             2,706 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-10-35-17.log
10/08/2018  12:10             8,595 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-12-10-31.log
10/08/2018  12:40             4,372 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-12-40-20.log
10/08/2018  13:35             4,450 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-13-35-32.log
...

Sample crash log:

[build 8f0ead9ec3db] 2018-08-10 04:40:23
 Access violation, cannot write at address [0x0000000000000000]
 Exception address: [0x00007FF766E0FA53]
 Crashing thread: rjreaderthread
    MxCsr:  [0x0000000000001F80]
    SegDs:  [0x000000000000002B]
    SegEs:  [0x000000000000002B]
    SegFs:  [0x0000000000000053]
    SegGs:  [0x000000000000002B]
    SegSs:  [0x000000000000002B]
    SegCs:  [0x0000000000000033]
    EFlags:  [0x0000000000010202]
    Rsp:  [0x0000000E249FB420]
    Rip:  [0x00007FF766E0FA53] ?
    Dr0:  [0x0000000000000000]
    Dr1:  [0x0000000000000000]
    Dr2:  [0x0000000000000000]
    Dr3:  [0x0000000000000000]
    Dr6:  [0x0000000000000000]
    Dr7:  [0x0000000000000000]
    Rax:  [0x0000000000000000]
    Rcx:  [0x0000000E5BDC4AC0]
    Rdx:  [0x0000000E11BA0AC0]
    Rbx:  [0x0000000E5BDC4A50]
    Rbp:  [0x0000000E4FA4EB80]
    Rsi:  [0x0000000000000000]
    Rdi:  [0x0000000E249FB558]
    R8:  [0x00007FFBD216F610]
    R9:  [0x00007FFBD216F618]
    R10:  [0x5000BB77A2A6EB15]
    R11:  [0x0000BB705D19EB74]
    R12:  [0x0000000E4D015A38]
    R13:  [0x0000000E506C22C8]
    R14:  [0x0000000E11BA0B00]
    R15:  [0x0000000E4F65C228]
    DebugControl:  [0x0000000E591E4E74]
    LastBranchToRip:  [0x0000000000000000]
    LastBranchFromRip:  [0x0000000000000000]
    LastExceptionToRip:  [0x0000000000000000]
    LastExceptionFromRip:  [0x0000000000000000]

 OS: Windows
 Arch: x86-64

 Backtrace:
  [0x00007FF766E0FA53] ?
Args:  [0x0000000E4F65C1F0]  [0x0000000E00000002]  [0x0000000000000063]
  [0x00007FF766CEDA7A] ?
Args:  [0x0000000E249FB558]  [0x00007FFBD20D419B]  [0x0000000E4D809480]
  [0x00007FF766ABEA09] ?
Args:  [0x0000000E001FBDA0]  [0x0000000E00000006]  [0x0000000000000063]
  [0x00007FF76666C8FA] ?
Args:  [0x0000000E4FA4EB80]  [0x0000000E4FA4EB80]  [0x00000000FFFFFFFF]
  [0x00007FF766668ED3] ?
Args:  [0x0000000E5AA94830]  [0x0000000E5AA94830]  [0x0000000E5AA7B940]
  [0x00007FF766D4E922] ?
Args:  [0x0000000000000000]  [0x00007FFBF21416A0]  [0x00007FFBF21416A0]
  [0x00007FFBD212BE1D] crt_at_quick_exit + 125/784
Args:  [0x00007FFBF21416A0]  [0x0000000E5AA7B940]  [0x0000000000000000]
  [0x00007FFBF21416AD] BaseThreadInitThunk + 13/48
Args:  [0x0000000000000000]  [0x0000000000000000]  [0x0000000000000000]
  [0x00007FFBF2AC54F4] RtlUserThreadStart + 52/1008
Args:  [0x0000000000000000]  [0x0000000000000000]  [0x0000000000000000]
 Crash dump written to: E:\Programs\Splunk\var\log\splunk\E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-04-40-23.dmp

Splunk ran as local administrator
HXP33715 /Windows Server 2012 R2
GetLastError(): 8
Threads running: 15
Executable module base: 0x00007FF7662F0000
Runtime: 65.111172s
argv: [splunkd search --id=remote_hxp33714_scheduler__nobody__uberAgent__RMD5e28e2a5bd72887c9_at_1533872164_93340 --maxbuckets=0 --ttl=60 --maxout=0 --maxtime=0 --lookups=1 --streaming --sidtype=normal --outCsv=true --user=splunk-system-user --pro --roles=admin:db_connect_user:dbx_user:itoa_admin:itoa_analyst:itoa_user:power:splunk-system-role:user]
Thread: "rjreaderthread", did_join=1, ready_to_run=Y, main_thread=N
First 4 bytes of Thread token @0xe5aa94844:
00000000  8c 0d 00 00                                       |....|
00000004


x86 CPUID registers:
         0: 0000000D 756E6547 6C65746E 49656E69
         1: 000306F0 04010800 FFFA3203 0FABFBFF
         2: 76036301 00F0B5FF 00000000 00C30000
         3: 00000000 00000000 00000000 00000000
         4: 00000121 01C0003F 0000003F 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000077 00000002 00000009 00000000
         7: 00000000 000027AB 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300401 0000007F 00000000 00000000
         B: 00000000 00000001 00000100 00000004
         C: 00000000 00000000 00000000 00000000
         😧 00000007 00000340 00000340 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000021 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30393632 20347620
  80000004: 2E322040 48473036 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302A 00000000 00000000 00000000
terminating...
0 Karma

Builder

The uberAgent app is built in Simple XML with just a little JavaScript. It only uses technologies officially supported by Splunk. Unfortunately, only Splunk can fix the crashes. Also, they are the only ones who can figure out where exactly Splunkd crashed with the help of the dump files. You need to contact Splunk support and open a ticket for this.

Regarding the dump files that quickly fill up your disk: if you are referring to dump files created by Windows Error Reporting (WER), you can disable the creation of these files by deleting the registry key HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps (details).

Disclaimer: I work for vast limits, the uberAgent company.

0 Karma