When using "Non-Audit" as the data input, I am able to retrieve a few lines, and then the input fails. Other input settings work as intended.
While I was monitoring the output of /opt/splunk/var/log/splunk/splunk_ta_checkpoint-opseclea_modinput.log
I noticed the following:
2016-06-27 16:46:37,129 +0000 log_level=INFO, pid=28652, tid=Thread-9, file=ta_opseclea_data_collector.py, func_name=get_logs, code_line_no=62 | [input_name="fwmgmtp02-nonAudit" connection="fwmgmtp02" data="non_audit"]log_level=2 file:lea_loggrabber.cpp func_name:read_fw1_logfile_collogs code_line_no:2052 :LEA collected logfile handler was invoked
2016-06-27 16:47:02,901 +0000 log_level=ERROR, pid=28652, tid=Thread-1, file=event_writer.py, func_name=_do_write_events, code_line_no=79 | EventWriter encounter exception which maycause data loss, queue leftsize=838
Traceback (most recent call last):
File "/opt/splunk/etc/apps/Splunk_TA_checkpoint-opseclea/bin/splunk_ta_checkpoint_opseclea/splunktalib/event_writer.py", line 62, in _do_write_events
for evt in event:
File "/opt/splunk/etc/apps/Splunk_TA_checkpoint-opseclea/bin/splunk_ta_checkpoint_opseclea/splunktaucclib/data_collection/ta_data_collector.py", line 59, in <genexpr>
index, scu.escape_cdata(event.event)) for event
File "/opt/splunk/etc/apps/Splunk_TA_checkpoint-opseclea/bin/splunk_ta_checkpoint_opseclea/splunktalib/common/util.py", line 71, in escape_cdata
data = data.encode("utf-8", errors="xmlcharrefreplace")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 572: invalid start byte
2016-06-27 16:47:02,914 +0000 log_level=INFO, pid=28652, tid=Thread-1, file=event_writer.py, func_name=_do_write_events, code_line_no=84 | Event writer stopped, queue leftsize=849
2016-06-27 16:47:02,915 +0000 log_level=INFO, pid=28652, tid=Thread-4, file=ta_data_collector.py, func_name=_write_events, code_line_no=122 | [input_name="fwmgmtp02-nonAudit" data="non_audit"] the event queue is closed and the received data will be discarded
2016-06-27 16:47:02,915 +0000 log_level=INFO, pid=28652, tid=Thread-4, file=ta_data_collector.py, func_name=index_data, code_line_no=114 | [input_name="fwmgmtp02-nonAudit" data="non_audit"] End of indexing data for fwmgmtp02-nonAudit_non_audit
It seems that only the Non-Audit setting is retrieving unexpected Unicode values
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 572: invalid start byte
at which point the process hangs and indexing stops. The opsec_lea connection stays up, however, event processing is halted.
The byte value (0xa0) in this case isn't a specific indicator, I've had the same result for 0xf2 and 0xfc. I noticed that all of these are valid ASCII characters, but invalid standard Unicode characters (for example, 0xfc is in the "Specials" block as an o with an umlaut).
I believe this only occurs with Non-Audit as this is the only input setting that also retrieves Anti-Malware and Anti-Virus events (which I need to index). Although it is most likely a Unicode-handling error, I also found a related issue with fw-loggrabber here:
http://manpages.ubuntu.com/manpages/trusty/man1/fw1_lea2dlf.1.html
In the "Other notes" it points out an "unexpected non-continuation byte" -- the Splunk output is almost exactly the same: "invalid continuation byte".
I tested setting the environment variables in /opt/splunk/etc/splunk-launch.conf however had the same issue.
Does anybody else have an idea to enable Non-Audit and keep it stable?
I found a temporary fix, hopefully this will be rolled into the next version of the TA.
Edit line 71 of /opt/splunk/etc/apps/Splunk_TA_checkpoint-opseclea/bin/splunk_ta_checkpoint_opseclea/splunktalib/common/util.py
from:
data = encode("utf-8", errors="xmlcharrefreplace")
to:
data = data.decode("latin-1").encode("utf-8", errors="xmlcharrefreplace")
You'll have to restart the heavy forwarder or indexer to make the changes take effect.
I believe util.py is expecting valid utf-8 input, however the output of lea_loggrabber can include non-UTF-8 encoded data, especially when retrieving SmartDefense, Anti-Malware or Anti-Virus log entries.
This fix allows me to ignore the encoding of the lea_loggrabber output and treat it as binary data, map the incoming text to a valid unicode range and then encode as utf-8. This way util.py can return valid output back up to ta_data_collector.py and then event_writer.py, both of which also expect utf-8.
If at all possible, future versions of this TA should include try
and except
handlers so that it fails gracefully.
Credit where credit is due, I found a similar issue and fix here:
http://www.gossamer-threads.com/lists/python/python/623758#623758
Splunk are currently investigating this issue. (ADDON-10459)
The above workaround appears to solve the issue for the time being.
I found a temporary fix, hopefully this will be rolled into the next version of the TA.
Edit line 71 of /opt/splunk/etc/apps/Splunk_TA_checkpoint-opseclea/bin/splunk_ta_checkpoint_opseclea/splunktalib/common/util.py
from:
data = encode("utf-8", errors="xmlcharrefreplace")
to:
data = data.decode("latin-1").encode("utf-8", errors="xmlcharrefreplace")
You'll have to restart the heavy forwarder or indexer to make the changes take effect.
I believe util.py is expecting valid utf-8 input, however the output of lea_loggrabber can include non-UTF-8 encoded data, especially when retrieving SmartDefense, Anti-Malware or Anti-Virus log entries.
This fix allows me to ignore the encoding of the lea_loggrabber output and treat it as binary data, map the incoming text to a valid unicode range and then encode as utf-8. This way util.py can return valid output back up to ta_data_collector.py and then event_writer.py, both of which also expect utf-8.
If at all possible, future versions of this TA should include try
and except
handlers so that it fails gracefully.
Credit where credit is due, I found a similar issue and fix here:
http://www.gossamer-threads.com/lists/python/python/623758#623758
I wish I had thought to search for this before I tracked this bug down and patched it myself 🙂
Thanks! This fix worked for us as well. We started experiencing the same issue after moving to v4.0.0 of the TA. I will open a case as well.
Great, I came to exactly same workaround (but using iso-8859-1 instead of latin-1, which is the same ;))
I just opened a case to get that fixed.
A restart is not necessary... i just disabled and re-enabled the input in the UI and the change in the py file was picked up.