Hi all,
Within Splunk ES I've configured a test threat intelligence feed with the following settings:
New > Line oriented
Name: Binary Defense Banlist
type: network
url: https://www.binarydefense.com/banlist.txt
weight: 60
interval: 43200
Max Age: -30d
Max Size: 52428800
Checked Threat Intelligence
File parser: line
Delimiting regular exp:
Extracting regex: ^(\d.+)$
Ignoring regex: (^#|^\s*$)
fields: ip$1,description:BinaryDefense_banlist
skip header lines: 0
No encoding, no user agent, sinkhole checked.
Some global parse modifier settings:
Certificate attribute breakout = checked
IDNA encode domains = unchecked
Parse domain from URL = unchecked
In debug mode I see that the file is downloaded and then it says:
<timestamp> INFO pid=1050977 tid:MainThread file=get_parser.oy:_detect_file_type:139 | stanza"binary Defense Banlist" status="Automatically detected STIX parsing for file_path /opt/splunk/var/lib/splunk/modinputs/threatlist/Binary Defense Banlist"
It goes on to parse the file and get the records. However, the records contain HTML elements like <'\div> and <\iframe> as url value. This is strange since it's just a .txt file. Moreover, why is it parsing it like a STIX document when I explicitly stated that the File parser = line?
This happens with other threat feeds as well. I've checked with a colleague at another client and with the exact same settings his works and mine doesn't.
Am I missing something? Do you know where else I can look to troubleshoot?
Some figures:
Splunk: 8.2.9
ES: 7.0.1
Single search head, behind proxy
... View more