Monitoring Splunk

Why does my Splunk server keep crashing?

dmitri47
Engager

-bash-4.1$ cat crash-2018-05-21-09:41:12.log
[build fa31da744b51] 2018-05-21 09:41:12
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 12969 running under UID 18002.
Crashing thread: DistributedSearchResultCollectorThread
Registers:
RIP: [0x00007FA78E16B495] gsignal + 53 (libc.so.6 + 0x32495)
RDI: [0x00000000000032A9]
RSI: [0x00000000000032C9]
RBP: [0x00007FA7916AEC30]
RSP: [0x00007FA78B5FEA08]
RAX: [0x0000000000000000]
RBX: [0x00007FA7887FD000]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x0000000000000200]
R9: [0xFEFEFEFEFEFEFEFF]
R10: [0x0000000000000008]
R11: [0x0000000000000206]
R12: [0x00007FA7915F37C6]
R13: [0x00007FA791793680]
R14: [0x00007FA78B688010]
R15: [0x00007FA78B5FED10]
EFL: [0x0000000000000206]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace (PIC build):
[0x00007FA78E16B495] gsignal + 53 (libc.so.6 + 0x32495)
[0x00007FA78E16CC75] abort + 373 (libc.so.6 + 0x33C75)
[0x00007FA78E16460E] ? (libc.so.6 + 0x2B60E)
[0x00007FA78E1646D0] __assert_perror_fail + 0 (libc.so.6 + 0x2B6D0)
[0x00007FA7909B0E6F] _ZN9EventLoop3addEP8PolledFd18PollableDescriptorj + 591 (splunkd + 0x1251E6F)
[0x00007FA7909B2BAE] _ZN19InThreadActorNotifyC2EP9EventLoop + 46 (splunkd + 0x1253BAE)
[0x00007FA7909B2E50] _ZN9EventLoop3runEv + 96 (splunkd + 0x1253E50)
[0x00007FA790A6DAF0] _ZN15TcpOutboundLoop3runEv + 16 (splunkd + 0x130EAF0)
[0x00007FA78FFBFF05] _ZN21EventLoopRunnerThread4mainEv + 37 (splunkd + 0x860F05)
[0x00007FA790A6EB1F] _ZN6Thread8callMainEPv + 111 (splunkd + 0x130FB1F)
[0x00007FA78E4D4AA1] ? (libpthread.so.0 + 0x7AA1)
[0x00007FA78E221BCD] clone + 109 (libc.so.6 + 0xE8BCD)
Linux / / 2.6.32-696.28.1.el6.x86_64 / #1 SMP Thu Apr 26 04:27:41 EDT 2018 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2018-05-21 07:50:44.714 -0400 splunkd started (build fa31da744b51)
2018-05-21 08:05:58.776 -0400 splunkd started (build fa31da744b51)
splunkd: /home/build/build-src/minty/src/util/EventLoop.cpp:843: void EventLoop::add(PolledFd*, PollableDescriptor, events_mask_t): Assertion `fd.valid()' failed.
2018-05-21 08:15:40.920 -0400 splunkd started (build fa31da744b51)
2018-05-21 08:30:58.927 -0400 splunkd started (build fa31da744b51)
2018-05-21 08:40:36.969 -0400 splunkd started (build fa31da744b51)
2018-05-21 08:50:37.156 -0400 splunkd started (build fa31da744b51)
2018-05-21 09:05:55.191 -0400 splunkd started (build fa31da744b51)
2018-05-21 09:25:37.188 -0400 splunkd started (build fa31da744b51)
2018-05-21 09:35:45.231 -0400 splunkd started (build fa31da744b51)

/etc/redhat-release: Red Hat Enterprise Linux Server release 6.9 (Santiago)
glibc version: 2.12
glibc release: stable
Last errno: 23
Threads running: 16
Runtime: 327.200553s
argv: [splunkd -p 8089 restart]
Process renamed: [splunkd pid=6741] splunkd -p 8089 restart [process-runner]
Process renamed: [splunkd pid=6741] search --id=scheduler_nobodyf5_RMD54f4818d5a227023d_at_1526910000_56 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --lookups=1 --reduce_freq=10 --user=splunk-system-user --pro --roles=admin:splunk-system-role

Regex JIT disabled due to SELinux

using CLOCK_MONOTONIC
Preforked process=0/65: process_runtime_msec=606, search=0/124, search_runtime_msec=592, new_user=N, export_search=N, args_size=256, completed_searches=0, user_changes=0, cache_rotations=0

Thread: "DistributedSearchResultCollectorThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7fa78ab1ce10:
00000000 00 f7 5f 8b a7 7f 00 00 |.._.....|
00000008

x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000206D2 0A040800 9E982203 1F8BFBFF
2: 76036301 00F0B5FF 00000000 00C10000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000077 00000002 00000009 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000000 000000CD 0000000A
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 28100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 37383632 33762057
80000004: 33204020 4730312E 00007A48 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003028 00000000 00000000 00000000
terminating...

Tags (2)
0 Karma

dmitri47
Engager

https://answers.splunk.com/answers/290645/why-is-our-splunk-624-forwarder-on-linux-crashing.html

Splunk 6.2.4 seems to have introduced a bug that causes splunkd to crash when a monitor watches for files that may be deleted (maybe too fast ?)

I see in your output that the crash is related with the Nmon Performance Monitor:

WatchedTailFile-WatchedFileState: path="/opt/splunkforwarder/var/run/nmon/var/csv_repository/Dymas_24_JUL_2015_053319_FILE_444882_20150724070843.nmon.csv", flags=0x24003
The crash is not directly caused by Nmon App, until recently the processing steps used to create csv files in the same directory than splunk watches for, in some cases empty files could be created and deleted by nmon2csv converters, which causes splunkd in 6.2.4 to crash. (which is totally unexpected and wasn't the case before)

I have released on 5 august 2014 an hotfix release with a workaround to manage this, now files are moved from a working directory to final directory splunk watches for, which solves the issue from splunkd.

Please update to Nmon Perf Monitor 1.6.04 and your problem will be solved.

0 Karma

dmitri47
Engager

Seems that we had this issue back since 6.2.4, fixed since then, and broken again with 7.x somehow... SMH

0 Karma

lguinn2
Legend

What is the version of Splunk, plus the size of your environment?

0 Karma

dmitri47
Engager

This first started when I upgraded from Splunk 7.0.3 to 7.1

After noticing that this one server (a Splunk SearchHead) was crashing every 3-5 mins, I downgraded that whole environment from 7.1 to 7.0.3 (like 8-9 servers total)

It was fine for 1-2 days and now is crashing all o the time.
Other enclaves work just fine = Splunk 7.1 is stable

Size = 1 CM server, 2 indexers, 4 searchheads, up to 150 splunkforwarders

0 Karma

ddrillic
Ultra Champion

I would go to Support ...

dmitri47
Engager

Yeah... Will post if and when I get resolution. Gooogle says that a bunch of other users have had this issue. Why can't splunk fix it?

0 Karma

tkrishnan
Explorer

@dmitri47 did you get anywhere with this one? Heard anything from Support ?

0 Karma

SithLord
Explorer

Soooo... There was a known issue in Splunk 7.0.3 and the upgrade to Splunk 7.1.1 fixed it.
Has been working well since.

0 Karma

tkrishnan
Explorer

thanks for the super quick answer. we have the same issue with 6.6.3. Did you find an issue number or something for this one so i can trace it back to my version's known issue documentation?

0 Karma

SithLord
Explorer

I would look here:

http://docs.splunk.com/Documentation/Splunk/7.1.1/ReleaseNotes/Fixedissues

Fixed issues:

2018-05-18 SPL-154138, SPL-154542, SPL-154544, FAST-9662 Searches with multikv extraction use too much memory: potentially orders of magnitude more than previous versions.

solarboyz1
Builder

It appears you have SE Linux enabled, have you followed:
https://github.com/doksu/selinux_policy_for_splunk

0 Karma

dmitri47
Engager

We have 4 enclaves and Splunk on all 4. All have the same SE Linux set, but issues only on 1 server.

0 Karma
Get Updates on the Splunk Community!

How I Instrumented a Rust Application Without Knowing Rust

As a technical writer, I often have to edit or create code snippets for Splunk's distributions of ...

Splunk Community Platform Survey

Hey Splunk Community, Starting today, the community platform may prompt you to participate in a survey. The ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...