Solved: Splunk Stream: How to manage whitelists and blackl...

mathiask · ‎12-12-2016

How can I manage the Splunk Stream Black and Whitelist?
we have large white and black lists, with several hundred entries, which are also changing very frequently. Therefore the GUI is not really an option.
I could not figure out where this information is "stored" and how to access it.
How can I apply the black-/whitelist also to the NetFlows?
As far as I could tell the whitelist was only applied to the traffic capturing but not to the parsed NetFlow events.

vshcherbakov_sp · ‎12-12-2016

HI @mathiask,

Re: 1. Stream has REST API that can be used to update the black/white list. If that's something you'd consider, we can come up with some sample code (python).

Re: 2. NetFlow is received via a UDP listening socket. Do you want to whitelist/blacklist certain IP addresses that can/cannot send Netflow to Stream, or want to filter Netflow events by IP addresses? For former, you probably want to configure it on a firewall or router; latter can be achieved by configuring a filter on netflow stream(s) in the Configure Stream UI.

View solution in original post

dcavuto_splunk · ‎12-13-2016

Mathias,

One of the ways we suggest customers address Stream and NetFlow volume-related concerns is via the use of the "Aggregation" feature. With Aggregation, instead of sampling or prefiltering, you can summarize the data coming in using a user-defined key and aggregate fields, over a custom aggregation interval.

For example, you could define the following as your aggregation key:
Source IP
Destination IP
Destination Port

And the following as your aggregation functions:
count
sum(bytes)
sum(packets)

If you specified an aggregation interval of 600 seconds (for example), then every 10 minutes you'd get a list of the unique triple (sip,dip,dport) and the three aggregation functions related to each of them. Since this happens before the data is indexed, the affect on your Splunk license is greatly reduced.

This is a great alternative to prefiltering out a lot of potentially useful data, since you still get much of the resolution that you had in the original flow records, but don't have nearly the volume of data.

Best,
-David

David J Cavuto, CISSP
Principal Product Manager, Splunk Stream™

mathiask · ‎12-14-2016

Hi David

We currently have a dedicated solution in place to collect the full netflows.
Side note, a former team colleague actually wrote the tool/code that you are basing on for your NetFlow Add-on https://splunkbase.splunk.com/app/1658/

Obviously it would be really awesome to have all the netflow information in Splunk and to correlate it etc. etc.
But as I stated the volume is far too large, and by that I mean by orders of magnitudes ... currently we are talking of an average of 100k Flows per second with much larger peaks, tendency growing ...
This results in an estimated Splunk Indexed volume of about 10TB/day ... beside the license costs, this means we are talking about 100 indexers with all the maintenance, storage etc. this is way out of the current scope

Even considering aggregation would not solve this.
Assuming we are able to achieve a reduction of 90%, which I don't expect it to be, we are still talking about 1 TB per day, i.e. 10 servers ...
This would be still out of scope and is simply not the use case we are currently pursuing.

While I see the potential benefits to analyse it with Splunk instead of a dedicated solution, this is simply not the scope of our current setup. If one day we are really considering doing this, I think we rather go down the Hadoop (+Splunk?) path.

vshcherbakov_sp · ‎12-12-2016

HI @mathiask,

Re: 1. Stream has REST API that can be used to update the black/white list. If that's something you'd consider, we can come up with some sample code (python).

Re: 2. NetFlow is received via a UDP listening socket. Do you want to whitelist/blacklist certain IP addresses that can/cannot send Netflow to Stream, or want to filter Netflow events by IP addresses? For former, you probably want to configure it on a firewall or router; latter can be achieved by configuring a filter on netflow stream(s) in the Configure Stream UI.

mathiask · ‎12-14-2016

Just to state this clearly

Stream REST API would be the best way to go. Python is fine, fyi we are on Python 3, therefore not Splunk SDK, but you can give us the Python 2 Code and we will convert it. We can then share it back with you.

If you want to share this "offline" then you can send me this at mathias.karlsson@switch.ch

With the REST API we can then directly access Stream through our "processing engine" / "messaging system" where this information is being processed anyways, which is perfect.

FYI
We are using the Splunk KV Store as backend for that processing engine instead of a traditional DB or running a separate mongoDB. This allows us to monitor the messages that are being processed and/or are ready to be delivered directly in Splunk.

mathiask · ‎12-13-2016

Okay I have to explain the use case a bit in more detail

I do NOT want to limit the devices able to send the flows. As you mentioned this is done on routing and/or FW.
I want to filter the NetFlow "events" pre-indexing. This is because we have a ridiculous amount of NetFlow (estimating 5 - 20TB/day in Splunk) ... and because of that we have currently have a dedicated solution. This might maybe change eventually .. but I doubt that running 50 - 200 Indexers (100 GB/day) is the right way for us 😉

But some select set of NetFlows are of special interest and these we would like to have in Splunk today. And the most convenient way could be Splunk Streams, given we can limit the indexed events somehow.
This would point to Stream Filters.

I have several "lists" with different patterns and context information (for later enrichment).
The patterns can be simple IP, IP ranges or more complex like IP-port matches etc.
These lists can have easily more than hundred entries
We therefore would have many streams like "netflow-IP" with a large list of interesting IPs or one stream for each pattern like IP-port

Is there a way to bulk configure streams through the CLI or config files? or a REST API?
Is it possible to configure more "complex" filters like (ip=1.2.3.4 OR ip=2.3.4.5 OR 2001:1:2:3::4) AND (port=23 OR port=2323) Otherwise might have to create a few hundred streams

Greetings
Mathias

So back to the first question
While "local config" / CLI or lookup would sometimes be helpful since it is usually easier to get around with.
But a REST API would be really helpful to manage the configuration and get it automated.

Splunk Stream: How to manage whitelists and blacklists and also apply them to NetFlow?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

Splunk Stream: How to manage whitelists and blacklists and also apply them to NetFlow?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!