Monitoring Splunk

splunkd down, why ?

Path Finder

hi


in splunkd.log and crash.log
this log are full


then splunkd down...


What does this mean?

crash.log

(Out of file descriptors!)
[build 119532] 2012-04-26 18:40:56

File descriptors open:
0: /opt/splunk/var/log/splunk/crash-2012-04-26-18:40:56.log
1: /opt/splunk/var/log/splunk/splunkd_stdout.log
2: /opt/splunk/var/log/splunk/splunkd_stderr.log
3: /opt/splunk/var/log/splunk/splunkd.log
4: socket:[40632058]
5: socket:[40632059]
6: socket:[40632060]
(...snipped...)
1020: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1939/Strings.data
1021: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1940/SourceTypes.data
1022: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1940/Strings.data
1023: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1941/SourceTypes.data
(Total 1024)
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 6885 running under UID 0.
 Crashing thread: indexerPipe
 Registers:
    RIP:  [0x00000030D1830265] gsignal + 53 (/lib64/libc.so.6)
    RDI:  [0x0000000000001AE5]
    RSI:  [0x0000000000001AEE]
    RBP:  [0x000000004208E940]
    RSP:  [0x000000004208DB08]
    RAX:  [0x0000000000000000]
    RBX:  [0x000000004208DBB0]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0x0000000000000080]
    R9:  [0x0101010101010101]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000202]
    R12:  [0x00007FFFD1786A1A]
    R13:  [0x0000000001184250]
    R14:  [0x0000000000000327]
    R15:  [0x00000000011839D0]
    EFL:  [0x0000000000000202]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace:
 Linux / splunkindex1 / 2.6.18-194.el5 / #1 SMP Tue Mar 16 21:52:39 EDT 2010 / x86_64

splunkd log

    04-28-2012 12:01:38.875 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_5861/.rawSize": No such file or directory
04-28-2012 12:01:38.875 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=5861
04-28-2012 12:01:38.878 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6356/.rawSize": No such file or directory
04-28-2012 12:01:38.878 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6356
04-28-2012 12:01:38.880 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6701/.rawSize": No such file or directory
04-28-2012 12:01:38.880 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6701
04-28-2012 12:01:38.881 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6987/.rawSize": No such file or directory
04-28-2012 12:01:38.881 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6987
04-28-2012 12:01:38.882 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_7155/.rawSize": No such file or directory
04-28-2012 12:01:38.882 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=7155
04-28-2012 12:01:38.884 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_7353/.rawSize": No such file or directory
04-28-2012 12:01:38.884 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=7353
04-28-2012 12:01:38.887 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8029/.rawSize": No such file or directory
04-28-2012 12:01:38.888 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=8029
04-28-2012 12:01:38.889 +0900 INFO  timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8357/.rawSize": No such file or directory
04-28-2012 12:01:38.890 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=8357
04-28-2012 12:01:38.896 +0900 INFO  HotDBManager - index=dh_os No hot found for event ts=1334372461, closest match=null [expanded span=0] hotbucketsize=87 numbucks=1 maxhot=3
04-28-2012 12:01:38.896 +0900 INFO  databasePartitionPolicy - creating new bucket /data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643
04-28-2012 12:01:38.896 +0900 ERROR JournalSlice - Cannot create new journal slice file: Too many open files, file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643/rawdata/0"
04-28-2012 12:01:38.896 +0900 ERROR JournalSlice - Failed to write header for rawdata
04-28-2012 12:01:38.896 +0900 INFO  HotDBManager - index=dh_os No hot found for event ts=1334372461, closest match=null [expanded span=0] hotbucketsize=87 numbucks=1 maxhot=3
04-28-2012 12:01:38.896 +0900 FATAL HotDBManager - hot dir with id already exists in createDir: /data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643

Splunk Employee
Splunk Employee

As it says at the very top, you are out of file descriptors. You need to increase the number of file descriptions available, preferably to "unlimited", possibly using the ulimit command, or by contacting your system administrator.

By the way, it was probably unhelpful to simply paste in over a thousand lines of text into a discussion forum where you are asking people to volunteer help, without taking some time to try to filter even a little bit for relevance, or ask if it would be useful.

Path Finder

my openfiles vaule is 4,096. (soft and hard)
openfiles improvement happens when you change the value to 10240?
But really ulimits problem?

0 Karma

Splunk Employee
Splunk Employee

SplunkTrust
SplunkTrust

Unix-like operating systems have a limit on the number of open files that a single process can have. In your case, RHEL5 defaults to 1024 per processs. A Splunk indexer needs several file descriptors for each open index bucket as well as one descriptor per connected forwarder. It is easy to run out of 1024. You will need to scale this value appropriately to the workload you are trying to run. This doc is helpful, http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Performance_Tuning_Guide/system-... even if specific to RH Directory Server.