Does Enterprise Security Threat Intelligence download feeds support normal web page input


We have a Tor threat intelligence feed that we require to add to Splunk Enterprise.

The intelligence feed is from dan . me . uk / tornodes

The format of the page is typically html followed by a starting tag _BEGIN_TOR_NODE_LIST

Does Splunk Enterprise Threat Intelligence download feeds support a HTML type of input ?

TOR Node List

This page contains a full TOR nodelist (updated at Mon Jun 17 19:31:39 BST 2019) in the format below.
There are tags of BEGIN_TOR_NODE_LIST and END_TOR_NODE_LIST for easy scripting use of this page.

You can also fetch (FULL) or (EXIT only) for a list of ips only, one per line - updated every 30 minutes. Ideal for constructing your own tor banlists.

NOTE: This is a FULL list including more than just exit nodes. If you only wish to block exit nodes you NEED to process the list to include only flags E and/or X!
You WILL upset people if you block the full list as many nodes do not permit exit.

Total number of nodes is: 7680|hidden|9001|0|RV|64883|Tor||hidden2|80|0|EFRDV|195782|Tor|decsription goes here

Internal IP's given for privacy reasons

Trying to use regular expressions to extract the fields fails


I've tried stating the number of lines to skip on the page and tried changing delimiter but it still comes back with parsing failure in the threat management log.

0 Karma


Hi @splunkmachine,

Yes sir you can !

Here's the guide on how to add a webpage as a threat intel source for ES :

Let me know if you're stuck somewhere when walking through it.


0 Karma


Hi David

I've added quite a few URL based intelligence feeds which are typically a web page of IP's however, as my original post yes I'm stuck as I get parsing errors.

I've followed the instructions.
Here's the guide on how to add a webpage as a threat intel source for ES :

I've tried the following to extract the fields.

And listed the fields

I've tried using regular expressions to extract the fields, I've also tried to use a separator.
The download feed consists of 8 fields seperated by '|' symbol which start at line 155 in the web page.
The web page consists of html and each line consisting of the six fields has the following html

The fields are:

Eight field is optional.

I've tested listing the fields in the notation as documented:

Checking the threat management log I see parsing failure.

0 Karma


What about this as a regex :


Is it giving you anything ?

0 Karma


Hi David

I tried your suggestion above which I tried also originally still parsing errors.

I went back to my originally
regex: (?^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})|(?\w+)|(?\d+)|(?\d+)|(?\w+)|(?\d+)|(?\w+\s+\S+)|(?[a-zA-Z&]\w+.*)?\

and removed delimiter this time setting fields
to ip:$1, description:"DAN_TOR-$3-$4-$5"

This worked!

0 Karma


Is there any chance you could post the full configurations for this?

0 Karma