All Apps and Add-ons

Splunk Stream: How to configure forwarder and indexers to improve the performance of Netflow feature?

Communicator

I got an other question(s) regarding Splunk App for Stream

I am playing around with the Netflow feature of Stream
I convinced our network guys to send us some netflows.
Even that this is only around 10% of our current netflow data, I estimate around 20 Mbps but it can be a bit bursty, I constantly get the following error
2016-12-05 11:59:59 ERROR [140212496193280] (SplunkSenderModularInput.cpp:429) stream.SplunkSenderModularInput - Event queue overflow; dropping 10001 events
{"timestamp":"2016-12-05T10:59:58.901309Z","agentMode":0,"level":"ERROR","message":"Event queue overflow; dropping 10001 events"}

My best guess is, that the sender queue is "full".
Yes, I have seen the 14 month old post https://answers.splunk.com/answers/311059/splunk-app-for-stream-if-indexing-queue-blocked-or.html
But I am not sure if this applies and/or if some things changed, since some stuff seriously changed since 2014 where streamfwd was still configured in XML.

I am currently running Streamfwd on a Splunk Enterprise Heavy Forwarder with 4 Cores and 4GB on our Openstack Environement
I currently use the following streamfwd.conf
[streamfwd]
processingThreads = 4
netflowReceiver.0.port = 18001
netflowReceiver.0.protocol = udp
netflowReceiver.0.ip = 10.0.102.240
netflowReceiver.0.decoder = netflow
netflowReceiver.0.decodingTreads = 8

outputs.conf
# Turn off indexing on the forwarder
[indexAndForward]
index = false
# TCP output global
[tcpout]
defaultGroup = cluster
forwardedindex.filter.disable = true
indexAndForward = false
# TCP output cluster group
[tcpout:cert-cluster]
indexerDiscovery = cluster-master
forceTimebasedAutoLB = true
useACK = true
maxQueueSize = 500MB
# indexer discovery group
[indexer_discovery:cert-cluster-master]
master_uri = https://:8089
pass4SymmKey =
I am not sure if the maxQueueSize helps on a Heavy Forwarder

The data is forwarded to an indexer cluster, 3 peers in the same environment, replicated to an other 3 backup peers in a 2nd location
This created about 500 Kbps to 2.5 Mbps Indexing rate

What do I have to tweak where?

  • Fowarder: ouputs.conf or streamfwd.conf and which parameters?
  • Indexers: resources seem to be not the problem, Maximum load average less than 0.3, etc.

PS: @Splunkers the streamfwd.conf documentation is quite incomplete

0 Karma
1 Solution

Splunk Employee
Splunk Employee

@mathiask,

While it may be worth troubleshooting HWF -> IDX forwarding performance, there's a intrinsic limit of how much data one can push through a modular input interface (ie. STM -> HWF). The error you're receiving suggests that you're either hitting this limit, or Stream modinput is throttled by HWF.

Have you checked the effective maxKBps setting the heavy forwarder is running with (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf)? It defaults to some small value and while Stream ships with limits.conf that sets this parameter to 0 (unlimited) it may not be the one Splunk uses (if other apps contain conflicting values, etc.)

If tuning the modinput and HWF forwarding doesn't solve the problem, you may want to consider deploying Stream Forwarder as a stand-alone agent (http://docs.splunk.com/Documentation/StreamApp/7.0.0/DeployStreamApp/InstallStreamForwarderonindepen...) that sends data to indexers directly via HTTP Event Collector API. This deployment avoids modinput limitations and can be scaled to send to multiple indexers with either a load balances such as Nginx (preferred) or specifying a list of HEC indexers in Distributed Forwarder Management UI inside Splunk Stream App.

PS: I'll pass your feedback to the doc team

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

@mathiask,

While it may be worth troubleshooting HWF -> IDX forwarding performance, there's a intrinsic limit of how much data one can push through a modular input interface (ie. STM -> HWF). The error you're receiving suggests that you're either hitting this limit, or Stream modinput is throttled by HWF.

Have you checked the effective maxKBps setting the heavy forwarder is running with (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf)? It defaults to some small value and while Stream ships with limits.conf that sets this parameter to 0 (unlimited) it may not be the one Splunk uses (if other apps contain conflicting values, etc.)

If tuning the modinput and HWF forwarding doesn't solve the problem, you may want to consider deploying Stream Forwarder as a stand-alone agent (http://docs.splunk.com/Documentation/StreamApp/7.0.0/DeployStreamApp/InstallStreamForwarderonindepen...) that sends data to indexers directly via HTTP Event Collector API. This deployment avoids modinput limitations and can be scaled to send to multiple indexers with either a load balances such as Nginx (preferred) or specifying a list of HEC indexers in Distributed Forwarder Management UI inside Splunk Stream App.

PS: I'll pass your feedback to the doc team

View solution in original post

0 Karma

Communicator

The heavy forwarder has by default a maxKBps limit of 0
cat system/default/limits.conf | grep maxKB
maxKBps = 0
Stream does not overrule this, i.e. there is no limits.conf in the app

I would not know what I could do to troubleshoot the performance HWF -> IDX ... the systems do not seem to be the problem ... according to Splunk all green.

Side note, I usually dont directly forward from the source to the indexers, I currently use a set of "heavy cluster forwarders" this allows me to make indexer changes, without needing changes on the clients, and no allows filtering and routing, doing this on the indexers ... not really sexy 😛

Okay I can try out a standalone setup
But you tested 300-400 Mbps. Did you do this with a standalone one?

0 Karma

Splunk Employee
Splunk Employee

It seems that the modinput throughput is the bottleneck then (which is not surprising).

Yes, we used the standalone stream forwarder sending to a load-balanced cluster of HTTP Event Collector-enabled indexers (would be heavy forwarders in your case) for our performance tests.

Also, Netflow decoding is CPU-bound, so 300-400Mbps of Netflow v9 would require 10-16 CPU cores depending on the CPU specs.

0 Karma