Splunk Add-on for Tenable: stalls when collecting ...

nickhills · ‎10-16-2017

Running the Splunk_TA_nessus (5.1.1) against security center works fine, and collects event data correctly, however it frequently (approx. weekly) stalls, and requires that either the input is disabled/enabled or the heavy forwarder is restarted.

It appears the Python process is still running, but it just stops trying to connect to security center.
This feels like the script is getting stuck somewhere. Wondering if anyone else has experienced the same?

The specific error which we see reported in tenable:sc:log is:

 error_msg=Unable to process Vuln Query.
 SecurityCenter could not process the vulnerability filter string (SC_ROOT=/opt/sc /opt/sc/bin/showvulns-individual  +orgid "1" +groupid "0" +tool 'listvuln' +datedir "2017-10-09" +scanid '5011' +view 'all' +startoffset '0' +endoffset '0' +repository "1"  -acceptRisk).
 11^list^0^0^-1

If my comment helps, please give it a thumbs up!

fairje · ‎08-08-2018

I just wanted to throw out a "me too" on this and also offer up that I believe the issue lies down further into the code than what nickhillscpl had shared in his "answer" (which is also why I think the issue got buried further down as well with the error checking they added.)

The section of code it is really stemming from is the line:

total_records = sc.get_total_records_for_vuln(scan_id)

This function call looks like this:

def get_total_records_for_vuln(self, scan_id):
    args = {'type': 'vuln',
            'sourceType': 'individual',
            'scanID': scan_id,
            'query_type': 'vuln',
            'query_tool': 'listvuln',
            'query_view': 'all'}
    self._build_query(None, args)
    args['query']['startOffset'] = 0
    args['query']['endOffset'] = 0
    result = self.perform_request('POST', 'analysis', args)
    return int(result['totalRecords'])

Inside this is the "perform_request" function which basically is what makes the actual REST API call to the Security Center server. I had already modified my own code block in here to include a little bit of "error" help on the request call itself, but this only seems to control for HTTP errors themselves and not for the API errors which are giving back "valid" responses, but the server is rejecting it.

tommoore · ‎06-06-2018

Same issue here with Splunk 7.1.1 and the latest plugin. Randomly quits within 2 weeks.. no errors, process is just still running.

I ran an strace on the stalled process, and it is generating these events over and over

[pid 18829] <... select resumed> ) = 0 (Timeout)
[pid 18829] select(0, NULL, NULL, NULL, {0, 50000}
[pid 18726] <... select resumed> ) = 0 (Timeout)
[pid 18726] select(0, NULL, NULL, NULL, {0, 50000}
[pid 18835] <... select resumed> ) = 0 (Timeout)
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/tenable_sc_inputs.conf", {st_mode=S_IFREG|0600, st_size=201, ...}) = 0
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/tenable_sc_inputs.conf", {st_mode=S_IFREG|0600, st_size=201, ...}) = 0
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/nessus.conf", {st_mode=S_IFREG|0600, st_size=533, ...}) = 0
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/nessus.conf", {st_mode=S_IFREG|0600, st_size=533, ...}) = 0
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/tenable_sc_servers.conf", {st_mode=S_IFREG|0600, st_size=172, ...}) = 0
[pid 18835] stat("/opt/splunk/etc/apps/Splunk_TA_nessus/local/tenable_sc_servers.conf", {st_mode=S_IFREG|0600, st_size=172, ...}) = 0
[pid 18835] getppid() = 18566
[pid 18835] select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
[pid 18835] select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
[pid 18835] select(0, NULL, NULL, NULL, {0, 4000}
[pid 18836] <... select resumed> ) = 0 (Timeout)
[pid 18836] select(0, NULL, NULL, NULL, {0, 50000}
[pid 18835] <... select resumed> ) = 0 (Timeout)

nickhills · ‎10-19-2017

Looking into this, I had suspected the issue was in this area of the ta_tenable_sc_data_collector.py script.

scan_results = sub_ckpt.get('scan_results')
    for (scan_id, scan_info) in scan_results.items():
        status = scan_info.get('status')
        if status == 'Partial' or status == 'Completed':
            if scan_info.get('total_records') is not None:
                continue
        try:
            scan_result = sc.get_scan_result(scan_id)
            status = scan_result.get('status')
            if status != 'Partial' and status != 'Completed':
                continue

            total_records = sc.get_total_records_for_vuln(scan_id)
            scan_info.update({'status': status,
                              'total_records': total_records,
                              'received': 0})
        except security_center.APIError as e:
            if e.error_code == 143:
                stulog.logger.warn('{} error_msg={}'.format(logger_prefix,
                                                            e.error_msg))
                del sub_ckpt['scan_results'][scan_id]
            elif e.error_code == 147:
                stulog.logger.warn('{} error_msg={}'.format(logger_prefix,
                                                            e.error_msg))
                del sub_ckpt['scan_results'][scan_id]
            else:
                raise e

The new 5.1.2 version has a number of changes which look to improve the error handling.

 scan_results = sub_ckpt.get('scan_results')
    for (scan_id, scan_info) in scan_results.items():
        status = scan_info.get('status')
        import_status = scan_info.get('importStatus')
        if (status == 'Partial' or status == 'Completed') and import_status == 'Finished':
            if scan_info.get('total_records') is not None:
                continue
        try:
            scan_result = sc.get_scan_result(scan_id)
            status = scan_result.get('status')
            import_status = scan_result.get('importStatus')
            if status == 'Error':
                scan_info.update({'retry_count': int(scan_info.get('retry_count', 0)) + 1})
            if (status != 'Partial' and status != 'Completed') or import_status != 'Finished':
                continue

            total_records = sc.get_total_records_for_vuln(scan_id)
            scan_info.update({'status': status,
                              'importStatus': import_status,
                              'total_records': total_records,
                              'received': 0})
        except security_center.APIError as e:
            if e.error_code in (143, 146, 147):
                stulog.logger.warn('{} error_msg={}'.format(logger_prefix,
                                                            e.error_msg))
                del sub_ckpt['scan_results'][scan_id]
            else:
                raise e

I'm not calling it yet, but fingers crossed this resolves the issue.
I will report back in a few days once its had time run for a bit!

If my comment helps, please give it a thumbs up!

nickhills · ‎11-06-2017

I have continuing to troubleshoot this issue, and have discovered that extending the period between attempted collections improves the reliability, but once again I arrive on Monday to see that its stalled over the weekend - Totally silently.

Very frustrating.

If my comment helps, please give it a thumbs up!

supreetsingh75 · ‎02-13-2018

I agree. I am not able to troubleshoot anymore as well. I was looking for the errors but no errors. Will Splunk ever fix this issue? How long of a period are you using to connect to securitycenter?

nickhills · ‎02-14-2018

I collect every 15 mins, which has lessened the rate at which it falls over.

I can get anywhere from 1-30 days before it needs to be restarted - About 14 days tends to be the average, but we got a month at Christmas! but last week i had to restart and it made it 16 hours.

Very infuriating, but I do have an alert to tell me when it dies now!

If my comment helps, please give it a thumbs up!

nickhills · ‎10-24-2017

Sadly, This is still broken in the same way.

The change in 5.1.2 now hides the error from the tenable:sc:log file, however its still stalling in the same way, except now it just fails silently so even my alerts have stopped working.

Disabling the input, resetting the Start Time (the checkpoint time) to something more recent than the previous configured start time, and re-enabling the input restores collection (for a while).

If my comment helps, please give it a thumbs up!

nickhills · ‎10-16-2017

The specific error which we see reported in tenable:sc:log is:

error_msg=Unable to process Vuln Query.
SecurityCenter could not process the vulnerability filter string (SC_ROOT=/opt/sc /opt/sc/bin/showvulns-individual  +orgid "1" +groupid "0" +tool 'listvuln' +datedir "2017-10-09" +scanid '5011' +view 'all' +startoffset '0' +endoffset '0' +repository "1"  -acceptRisk).
11^list^0^0^-1

If my comment helps, please give it a thumbs up!

worshamn · ‎10-17-2017

There was a 5.1.2 released, and the release notes say fixes "2017-08-22 ADDON-13413 Tenable input stops pulling vulnerability data" but even after installing that, I am still having issues with the latest version... I'm thinking I'm gonna have to open a support ticket but maybe 5.1.2 will work for you.

nickhills · ‎10-19-2017

Spotted that a few days ago, and have just updated to it - I'm not closing this yet, but will keep you updated.

If my comment helps, please give it a thumbs up!

Splunk Add-on for Tenable: stalls when collecting from Security Center

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Accelerating Observability as Code with the Splunk AI Assistant

Join the Conversation

Splunk Add-on for Tenable: stalls when collecting from Security Center

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Accelerating Observability as Code with the Splunk AI Assistant