Deployment Architecture

Why does running "splunk reload deploy-server -class test" crash splunkd (crashing thread:TcpChannelThread)?

Ellen
Splunk Employee
Splunk Employee

Deployment server was working fine but suddenly when I run $SPLUNK_HOME/bin/splunk reload deploy-server -class test it is crashing main splunkd.

$SPLUNK_HOME/var/log/splunk has a crashxxxx.log for each attempt.

Here is what I see on the command line after a run.

[host1]# /opt/splunk/splunk_621/bin/splunk reload deploy-server -class test
Your session is invalid.  Please login.
Splunk username: admin
Password: 
Login successful, running command...

An unforeseen error occurred:

Exception: <class 'httplib.BadStatusLine'>, Value: ''

Traceback (most recent call last):
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/splunk/clilib/cli.py", line 1145, in main
    parseAndRun(argsList)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/splunk/clilib/cli.py", line 938, in parseAndRun
    retVal = makeRestCall(cmd=command, obj=subCmd, restArgList=objUnicode(argList), sessionKey=authInfo)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/splunk/rcUtils.py", line 650, in makeRestCall
    serverResponse, serverContent = simpleRequest(uri, sessionKey=sessionKey, getargs=getargs, postargs=postargs, method=method)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/splunk/rest/__init__.py", line 470, in simpleRequest
    serverResponse, serverContent = h.request(uri, method, headers=headers, body=payload)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/httplib2/__init__.py", line 1421, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/httplib2/__init__.py", line 1171, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/opt/splunk/splunk_621/lib/python2.7/site-packages/httplib2/__init__.py", line 1147, in _conn_request
    response = conn.getresponse()
  File "/opt/splunk/splunk_621/lib/python2.7/httplib.py", line 1067, in getresponse
    response.begin()
  File "/opt/splunk/splunk_621/lib/python2.7/httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "/opt/splunk/splunk_621/lib/python2.7/httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''


Please file a case online at http://www.splunk.com/page/submit_issue
1 Solution

Ellen
Splunk Employee
Splunk Employee

From the support case and diag provided, the $SPLUNK_HOME/var/log/splunk/splunkd_stderr.log had the following message:

2015-03-24 12:56:06.797 -0700 splunkd started (build 255606)
Gap in numbered regexes: expected attribute=whitelist.1 not found (context: stanza='serverClass:myapp')

A review of $SPLUNK_HOME/etc/system/loca/serverclass.conf showed the incorrect whitelist number sequence.

[serverClass:test]
blacklist.0=1.1.1.4
blacklist.1=1.1.1.3
whitelist.2=1.1.1.1
whitelist.3=1.1.1.18

Once the whitelist sequence was corrected to start at 0, the crashes no longer occurred.
eg.

[serverClass:test]
blacklist.0=1.1.1.4
blacklist.1=1.1.1.3
whitelist.0=1.1.1.1
whitelist.1=1.1.1.18

A known issue, SPL-98561 has been logged to prevent crashes when the whitelist/blacklist number sequence has a gap.
This is targeted for a future maintenance release beyond 6.2.2

View solution in original post

Ellen
Splunk Employee
Splunk Employee

From the support case and diag provided, the $SPLUNK_HOME/var/log/splunk/splunkd_stderr.log had the following message:

2015-03-24 12:56:06.797 -0700 splunkd started (build 255606)
Gap in numbered regexes: expected attribute=whitelist.1 not found (context: stanza='serverClass:myapp')

A review of $SPLUNK_HOME/etc/system/loca/serverclass.conf showed the incorrect whitelist number sequence.

[serverClass:test]
blacklist.0=1.1.1.4
blacklist.1=1.1.1.3
whitelist.2=1.1.1.1
whitelist.3=1.1.1.18

Once the whitelist sequence was corrected to start at 0, the crashes no longer occurred.
eg.

[serverClass:test]
blacklist.0=1.1.1.4
blacklist.1=1.1.1.3
whitelist.0=1.1.1.1
whitelist.1=1.1.1.18

A known issue, SPL-98561 has been logged to prevent crashes when the whitelist/blacklist number sequence has a gap.
This is targeted for a future maintenance release beyond 6.2.2

View solution in original post