Deployment Architecture

Cluster search head down (crashes) splunk 7.1.0

aecruzp
Path Finder

good morning

   Our SH cluster is going back several times and we do not know the cause. someone could give me some support.

[build 2e75b3406c5b] 2019-03-11 19:36:50
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 10042 running under UID 501.
 Crashing thread: TcpChannelThread
 Registers:
    RIP:  [0x00007F20C087F625] gsignal + 53 (libc.so.6 + 0x32625)
    RDI:  [0x000000000000273A]
    RSI:  [0x0000000000004266]
    RBP:  [0x00007F20C3F31C00]
    RSP:  [0x00007F207B5FBF88]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007F20C1E16000]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0x0000000000000020]
    R9:  [0xFEFEFEFEFEFEFF09]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000206]
    R12:  [0x00007F20C3E80645]
    R13:  [0x00007F20C400E260]
    R14:  [0x0000000000000000]
    R15:  [0x0000000000000000]
    EFL:  [0x0000000000000206]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace (PIC build):
  [0x00007F20C087F625] gsignal + 53 (libc.so.6 + 0x32625)
  [0x00007F20C0880E05] abort + 373 (libc.so.6 + 0x33E05)
  [0x00007F20C087874E] ? (libc.so.6 + 0x2B74E)
  [0x00007F20C0878810] __assert_perror_fail + 0 (libc.so.6 + 0x2B810)
  [0x00007F20C2E9D70C] _ZN16SearchResultsMem8deepCopyERKS_P22ArenaStrPoolCopyHelpermm + 572 (splunkd + 0x102470C)
  [0x00007F20C2E9D959] _ZN16SearchResultsMem6appendERKS_mm + 105 (splunkd + 0x1024959)
  [0x00007F20C2E8D3E6] _ZN18SearchResultsFiles15appendMultiFileERS_b + 454 (splunkd + 0x10143E6)
  [0x00007F20C374C5CB] _ZN15AppendProcessor7executeER18SearchResultsFilesR17SearchResultsInfo + 331 (splunkd + 0x18D35CB)
  [0x00007F20C36CE62D] _ZN15SearchProcessor16execute_dispatchER18SearchResultsFilesR17SearchResultsInfoRK3Str + 237 (splunkd + 0x185562D)
  [0x00007F20C36BEAB7] _ZN14SearchPipeline7executeER18SearchResultsFilesR17SearchResultsInfo + 279 (splunkd + 0x1845AB7)
  [0x00007F20C3AD34C7] _ZN22HandleJobsDataProvider19handleResultsGetterEPN14DispatchSearch28GenericOutputResultsAcceptorERK3StrRK18SearchResultsFilesRK10StrSegmentbRK8Pathname + 3911 (splunkd + 0x1C5A4C7)
  [0x00007F20C3AD5D60] _ZN22HandleJobsDataProvider31executeEventResultPreviewActionERK3StrRK19SearchJobStatusDataRK13HttpArgumentsP6StrSetRK10StrSegmentb + 144 (splunkd + 0x1C5CD60)
  [0x00007F20C3AD5EC7] _ZN22HandleJobsDataProvider17handleStatusQueryERK21UserTimezoneSpecifierP6StrSetRK3StrP24CachedJobStatusReferenceRK10StrSegmentRK10HttpMethodRK13HttpArgumentsRb + 215 (splunkd + 0x1C5CEC7)
  [0x00007F20C3AD86A6] _ZN22HandleJobsDataProvider10handleJobsERK21UserTimezoneSpecifier10HttpMethodRK3StrRK10StrSegmentbRK13HttpArguments + 1094 (splunkd + 0x1C5F6A6)
  [0x00007F20C3AD97C2] _ZN22HandleJobsDataProvider18handleWithTimezoneERK21UserTimezoneSpecifier + 754 (splunkd + 0x1C607C2)
  [0x00007F20C3B06E53] _ZN38DispatchSearchDataProviderWithTimezone21handleWithoutTimezoneEv + 435 (splunkd + 0x1C8DE53)
  [0x00007F20C3A91FD9] _ZN26DispatchSearchDataProvider2goEv + 41 (splunkd + 0x1C18FD9)
  [0x00007F20C2D53E78] _ZN33ServicesEndpointReplyDataProvider9rawHandleEv + 88 (splunkd + 0xEDAE78)
  [0x00007F20C2D492AF] _ZN18RawRestHttpHandler10getPreBodyEP21HttpServerTransaction + 31 (splunkd + 0xED02AF)
  [0x00007F20C31F5930] _ZN32HttpThreadedCommunicationHandler11communicateER17TcpSyncDataBuffer + 272 (splunkd + 0x137C930)
  [0x00007F20C271F04A] _ZN16TcpChannelThread4mainEv + 218 (splunkd + 0x8A604A)
  [0x00007F20C328132F] _ZN6Thread8callMainEPv + 111 (splunkd + 0x140832F)
  [0x00007F20C0BE8A51] ? (libpthread.so.0 + 0x7A51)
  [0x00007F20C093596D] clone + 109 (libc.so.6 + 0xE896D)
 Linux / splunk_searchhead03_cnt / 2.6.32-573.el6.x86_64 / #1 SMP Wed Jul 1 18:23:37 EDT 2015 / x86_64
 Last few lines of stderr (may contain info on assertion failure, but also could be old):
    splunkd: /home/build/build-src/nightlight/src/framework/SearchResultsMem.cpp:354: void SearchResultsMem::deepCopy(const SearchResultsMem&, ArenaStrPoolCopyHelper*, size_t, size_t): Assertion `_defaultMvDelim.empty()' failed.
    2019-03-11 17:56:59.116 -0300 splunkd started (build 2e75b3406c5b)
    splunkd: /home/build/build-src/nightlight/src/framework/SearchResultsMem.cpp:354: void SearchResultsMem::deepCopy(const SearchResultsMem&, ArenaStrPoolCopyHelper*, size_t, size_t): Assertion `_defaultMvDelim.empty()' failed.
    splunkd: /home/build/build-src/nightlight/src/framework/SearchResultsMem.cpp:354: void SearchResultsMem::deepCopy(const SearchResultsMem&, ArenaStrPoolCopyHelper*, size_t, size_t): Assertion `_defaultMvDelim.empty()' failed.

 /etc/redhat-release: Red Hat Enterprise Linux Server release 6.7 (Santiago)
 glibc version: 2.12
 glibc release: stable
Last errno: 0
Threads running: 86
Runtime: 5991.658114s
argv: [splunkd -p 8089 start]
Regex JIT disabled due to SELinux

using CLOCK_MONOTONIC
Thread: "TcpChannelThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f20624650d0:
00000000  00 f7 5f 7b 20 7f 00 00                           |.._{ ...|
00000008
commandForThread=0, nextIdle=0x7f2098b99900, requestAfterThread=0, _tpfd=0x7f2098a29800, writeCorkCount=0, terminateCallback=(nil), ioError=No error, lastError=No error, terminateError=No error
giveCmd @0x7f2062465230: _queuedOn=(nil), ran=N, wantWake=N, wantFailIfLoopDone=N, cmd=0, ok=Y, chan=0x7f2098aaf800
writeDataAvail @0x7f2062465290: _queuedOn=(nil), ran=N, wantWake=N, wantFailIfLoopDone=N, chan=0x7f2098aaf800
wbuf: ptr=0x7f2062465330, size=0x8000, rptr=0x0, wptr=0x0
HttpListeningConnection: _transactionActive=Y, _haveHadTransaction=Y, _alreadyLoggedTimeout=N
HttpTcpConnection: peer=172.16.70.186, _desiredCompressionLevel=6
RestHttpServerTransaction: _restPath="search/jobs/packetcore__packetcore_UEFDS0VUX0NPUkU__RMD56deb342c100fc05e_1552343796.1088_D75AD645-1FB9-499B-9D7B-E9A513BABFA9/results_preview", namespaced=N, context=-/-, session=[user=packetcore, refcnt=7, touched=1552343810, refreshEligible=1552343895, removed=N, id=e85683b136ed38a6c2d92f1f8b5123ca, created=1552323168, refreshed=1552343670, expires=1552347270, initialLife=3600, createdBy=D75AD645-1FB9-499B-9D7B-E9A513BABFA9, portable, ip=172.16.70.186, csrf=16487147686831632066]
HttpServerTransaction: _state=6, _shouldLog=Y, _startTime=1552343810.555153524851
REQUEST: GET /en-US/splunkd/__raw/services/search/jobs/packetcore__packetcore_UEFDS0VUX0NPUkU__RMD56deb342c100fc05e_1552343796.1088_D75AD645-1FB9-499B-9D7B-E9A513BABFA9/results_preview?output_mode=json&search=search%20alert_state_interf%3E0%20%20id%3D%22cl-ml%22%20%20%20%7Crename%20ciudad%20as%20Comuna%2Cclave%20as%20%22Nombre%20Equipo%3APuerta%22%2C%20estado_interf%20as%20%22Estado%20Puerta%22%2C%20alert_state_interf%20as%20Estado_Puerta%20%7Ceval%20Estado_Limite%3Dalert_lim_interf%20%7Ceval%20%22%25%20Utilizacion%20Limite%20Interfaz%22%3D%20%22IN%3D%22.Porcentaje_interf_int.%22%25%20%20OUT%3D%22.Porcentaje_interf_out.%22%25%22%20%7Ceval%20%22Estado%20Trafico%22%3D%20%22IN%3D%22.round(traf_int%2C2).%22%20gbps%20%20OUT%3D%22.%20round(traf_out%2C2).%22%20gbps%22%20%7Ceval%20Estado_Trafico%3Dalert_traffic_interf%20%20%20%20%20%20%20%20%7Ctable%20datetime%2CComuna%2C%22Nombre%20Equipo%3APuerta%22%2C%20%22Estado%20Puerta%22%2C%20Estado_Puerta%7C%20appendpipe%20%5Bstats%20count%20as%20Sin_resultados%20%7C%20where%20Sin_resultados%3D0%5D%20%7C%20stats%20count&_=1552343796942 HTTP/1.1
    Host: 172.18.143.136:8000
    Connection: keep-alive
    Accept: text/javascript, text/html, application/xml, text/xml, */*
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36
    X-Requested-With: XMLHttpRequest
    Accept-Encoding: gzip, deflate
    Accept-Language: es-ES,es;q=0.9
    Cookie: mintjs%3Auuid=ec652fbe-e9b7-4813-80da-60f636105a9e; splunkweb_csrf_token_8000=16487147686831632066; session_id_8000=4136116f7bfc06e09ffa3ab8ddc1fa033dad4b2c; token_key=16487147686831632066; experience_id=b6e225a0-05d2-be97-2826-0106c92b3df7; splunkd_8000=akbksU0^HYSJ2ccEREa06hPvyiFToSp4E6fYTOl1ZAevd2d6Ly^CGuTyt6mbtUoyvpoYxSgpsWQj^94F5NH1qSrllIw^B1pqqrHkF5HbtxYrHW6x1PuQuSDPcDMTmyg2pHW39i8djQwcoN2AqmY
  _bytesReceived=0, _maximumRequestDataSize=0, _totalBytesExpectedOfRequestData=-1
  _bytesLeftInRequestDataChunk=0, _requestTransferEncodingIsChunked=N, _receivingRequestDataForever=N
  _needToSetupRequestGunzip=N, _owedConsume=7305804385234272835, _wantSavedRequestData=N
  _100continue=0, _expectDisconnect=N, _overrideSourceState=0
POST arguments: {}
REPLY: 200
    Set-Cookie: splunkd_8000=akbksU0^HYSJ2ccEREa06hPvyiFToSp4E6fYTOl1ZAevd2d6Ly^CGuTyt6mbtUoyvpoYxSgpsWQj^94F5NH1qSrllIw^B1pqqrHkF5HbtxYrHW6x1PuQuSDPcDMTmyg2pHW39i8djQwcoN2AqmY; Path=/; HttpOnly; Max-Age=3600; Expires=Mon, 11 Mar 2019 23:36:50 GMT
    Set-Cookie: splunkweb_csrf_token_8000=16487147686831632066; Path=/; Max-Age=157680000; Expires=Sat, 09 Mar 2024 22:36:50 GMT
ServicesEndpointReplyDataProvider: _setupState=0, _outputMode=2, _explicitOutputMode=Y
GET args: {["search"] = "search alert_state_interf>0  id="cl-ml"   |rename ciudad as Comuna,clave as "Nombre Equipo:Puerta", estado_interf as "Estado Puerta", alert_state_interf as Estado_Puerta |eval Estado_Limite=alert_lim_interf |eval "% Utilizacion Limite Interfaz"= "IN=".Porcentaje_interf_int."%  OUT=".Porcentaje_interf_out."%" |eval "Estado Trafico"= "IN=".round(traf_int,2)." gbps  OUT=". round(traf_out,2)." gbps" |eval Estado_Trafico=alert_traffic_interf        |table datetime,Comuna,"Nombre Equipo:Puerta", "Estado Puerta", Estado_Puerta| appendpipe [stats count as Sin_resultados | where Sin_resultados=0] | stats count"}
  _allowedMethods={GET,POST,PUT,DELETE,HEAD,OPTIONS}, _preconditionState=0
  _wantsSeparateThread=N, _alreadyBuiltHeaders=N, _needToSendBody=Y
  _bodyBytesWritten=0, _chunkedState=0, _isLastTransaction=N
  _varyBy=0x8, _redirectUrl="", _downloadFilename="", _totalScheduledLength=0
  _willSendDataLater=N, _toSendState=0, _toSendSafe=Y
  _knowCompleteLength=N, _desiredCompressionLevel=6
  _replyIsGzipCompressed=N, _cacheControl=0x0, _maxCacheSeconds=4294967295, _dontIncludeFrameOptions=N
In TcpChannel 0x7f2098a29800, _tcloop=0x7f209baaa288, no async write data, _data._shouldKill=N, r/w_timeouts=5.000/300.000, timeout_count=0
SSL: inactive
rbuf: ptr=0x7f2098a298a0, size=0x2000, rptr=0x0, wptr=0x0
TcpChannelAcceptor: , tcloop=0x7f209baaa288, _disabledReasons=0, _activeCount=9, _inflightSubordinateAccepts=0
HttpListener: ssl=N, _maxActiveConnections=1365, _wellBelowConnectionLimit=Y, _maxThreads=1365
SplunkdHttpListener: PORT: _allowGzip=Y, bind=http://:8000
  conf: _sslopt={rootCAPath="", caCertFile="", certFile="", privateKeyFile="", privateKeyPassword_set=N, commonNameToCheck="", altNameToCheck="", allowSslRenegotiation=Y, sslVersions="SSL3,TLS1.0,TLS1.1,TLS1.2", cipherSuite="", ecdhCurves="", useCompression=N, quietShutdown=NdhFile="", shouldVerifyClientCert=N}, _allowSslRenegotiation=Y, _frameOptionsSameOrigin=Y, _strictTransportSecurityHeader=N, _allowBasicAuth=N, _allowCookieAuth=N
  conf: _streamInWriteTimeout=5.000, _maxContentLength=524288000, _maxThreads=1365, _maxSockets=1365, _forceHttp10=0
_thread=0x7f20624650c0: commandForThread=0, nextIdle=0x7f2098b99900, requestAfterThread=0, _tpfd=0x7f2098a29800, writeCorkCount=0, terminateCallback=(nil), ioError=No error, lastError=No error, terminateError=No error
giveCmd @0x7f2062465230: _queuedOn=(nil), ran=N, wantWake=N, wantFailIfLoopDone=N, cmd=0, ok=Y, chan=0x7f2098aaf800
writeDataAvail @0x7f2062465290: _queuedOn=(nil), ran=N, wantWake=N, wantFailIfLoopDone=N, chan=0x7f2098aaf800
wbuf: ptr=0x7f2062465330, size=0x8000, rptr=0x0, wptr=0x0


x86 CPUID registers:
         0: 0000000B 756E6547 6C65746E 49656E69
         1: 000206C2 22200800 029EE3FF BFEBFBFF
         2: 55035A01 00F0B2FF 00000000 00CA0000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000040 00000040 00000003 00001120
         6: 00000007 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300403 00000004 00000000 00000603
         B: 00000000 00000000 0000007D 00000022
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000001 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 20202020 20202020 45202020
  80000004: 35343635 20402020 30342E32 007A4847
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 00003028 00000000 00000000 00000000
terminating...
0 Karma
1 Solution

aecruzp
Path Finder

I will answer this ticket as a solution was given in the following way.

I quote:

In regards to SPL-159979 - Crashing Thread: TcpChannelThread - Post process search using stats fields with null crashes Splunk

This bug was fixed with the 7.1.6 release and the 7.2.4 release as well.

it is validated that version 7.2.4.2 has documented this incident in two cases, therefore the splunk was updated and the service no longer fell when making a query or reviewing a panel.

2018-12-03 SPL-163063, SPL-159979 Crashing Thread: TcpChannelThread - Post process search using stats fields with null crashes Splunk

I hope that someone will serve this information that gave many problems.

regards

View solution in original post

0 Karma

aecruzp
Path Finder

I will answer this ticket as a solution was given in the following way.

I quote:

In regards to SPL-159979 - Crashing Thread: TcpChannelThread - Post process search using stats fields with null crashes Splunk

This bug was fixed with the 7.1.6 release and the 7.2.4 release as well.

it is validated that version 7.2.4.2 has documented this incident in two cases, therefore the splunk was updated and the service no longer fell when making a query or reviewing a panel.

2018-12-03 SPL-163063, SPL-159979 Crashing Thread: TcpChannelThread - Post process search using stats fields with null crashes Splunk

I hope that someone will serve this information that gave many problems.

regards

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@aecruzp Please accept the answer to help future readers find this solution.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...