Getting Data In

OWL data feed keeps stopping

_joe
Contributor

Good morning,

I am trying to use the OWL Diode Sender Add-on, the issue I am running into is the ingest keeps stopping. It works again after I restart it.

 

More details... I have a staged HFW (running the latest 9.4.x release) that is ONLY sending data to to OWL, currently UPD.

The problem is just that the data feed keeps stopping, then it won't start until I restart the heavy forwarder again.  I am currently assuming Splunk probably thinks the buffers are full or something so it "pauses indexing events." I login to the HFW and the status shows green with no errors.

I am not seeing any interesting error or fatal logs. I know the issue is with Splunk because the output logs show 0kps. 

This Splunk HFW is also acting as a deployment server  and has https turned on.  It has a perpetual (no data ingest) license and is ONLY sending data out to the OWL. 

 

Any suggestions would be greatly appreciated.

 

[indexAndForward]
index=false



[tcpout]
indexAndForward = false
defaultGroup = syslog:diode-syslog-udp
#defaultGroup = syslog:diode-syslog-tcp

forwardedindex.2.whitelist = (_audit|_internal|_configtracker|_dsclient|_dsphonehome|_dsappevent)
forwardedindex.filter.disable = true

[syslog:diode-syslog-tcp]
disabled = false
server = 10.0.1.100:5004
maxEventSize=512000


# PRO: Always works
# CON: limited to MTU size
[syslog:diode-syslog-udp]
disabled = false
server = 10.0.1.100:7504
maxEventSize=512000

 

Labels (1)
0 Karma

tscroggins
Champion

Hi @_joe ,

As a baseline, verify the limits.conf [thruput] stanza maxKBps setting. The value should be 0 on a heavy forwarder:

$ /opt/splunk/bin/splunk btool limits list thruput
[thruput]
maxKBps = 0

syslog outputs have a fixed maximum queue size of 97 KiB (99,328 bytes). The maximum theoretical thruput per syslog output is bound by the queue size and the round-trip time (RTT) to the destination. For example, if the one-way latency between the Splunk Enterprise sender and the Owl data diode is 1 ms, the RTT is 2 ms, and the the maximum thruput is 99,328 * 8 bits / 0.002 seconds = 397,312,000 bps / 8 bits = 49,664,000 Bps or ~47.363 MBps.

You can verify whether your syslog output and the internal indexing queue are blocked by looking at metrics.log. The internal indexing queue sits in front of both outputs and indexes. In the case of a heavy forwarder, the indexing queue may be blocked if one or more outputs are blocked:

index=_internal source=*metrics.log* group=queue name IN (indexqueue diode-syslog-tcp diode-syslog-udp) blocked=true

or optimized using the TERM operator:

index=_internal source=*metrics.log* TERM(group=queue) (TERM(name=indexqueue) OR TERM(name=diode-syslog-tcp) OR TERM(name=diode-syslog-udp)) TERM(blocked=true)

You can return statistics quickly using the tstats command:

| tstats prestats=t max(PREFIX(max_size_kb=)) max(PREFIX(largest_size=)) where index=_internal source=*metrics.log* TERM(group=queue) (TERM(name=indexqueue) OR TERM(name=diode-syslog-tcp) OR TERM(name=diode-syslog-udp)) TERM(blocked=true) by PREFIX(name=)

or over time:

| tstats prestats=t max(PREFIX(max_size_kb=)) avg(PREFIX(largest_size=)) where index=_internal source=*metrics.log* TERM(group=queue) (TERM(name=indexqueue) OR TERM(name=diode-syslog-tcp) OR TERM(name=diode-syslog-udp)) by _time PREFIX(name=)
| timechart max(max_size_kb=) avg(largest_size=) by "name="

If metrics.log isn't available in _internal log, you can search the log file locally in $SPLUNK_HOME/var/log/splunk/, e.g.:

$ grep blocked=true /opt/splunk/var/log/splunk/metrics.log*

If a queue is blocked, largest_size will exceed max_size_kb and provide hints for optimization.

If your Owl data diode has 1 Gbps available bandwidth, you need a queue size of 1,000,000,000 bps * 0.002 seconds / 8 bits = 250,000 bytes. Queue sizes, like TCP RWIN values, should be an even multiple of the end-to-end TCP MSS. I can't find a documented value from Owl, so we'll assume 1,460 bytes for standard ethernet. 250,000 / 1,460 = ~171, so we'll round up to 172 and use a starting queue size of 172 * 1,460 = 251,120 bytes. MSS values are larger if jumbo frames are used and may be smaller if another medium or encapsulation is used.

Actual thruput is less than available bandwidth. Standard 1 GbE, for example, has an actual theoretical maximum thruput of ~126,646,610 Bps or ~120.780 MBps. If your source generates data more quickly than your interface can transmit it, your source must queue. If your average thruput exceeds 120.780 MBps, then your source will queue to infinity. If your average thruput is less than 120.780 MBps, you will introduce queueing delay, but the data will arrive eventually.

Since syslog outputs have a fixed queue size, we can't modify the queue size directly as we can for other queues. Instead, we can scale the local pipeline horizontally by increasing the value of the server.conf [general] stanza parallelIngestionPipelines setting.

To saturate a 1 Gbps link with a RTT of 2 ms, we need a queue size of 251,120 bytes as shown above. 251,120 / 99,328 ~= 2.528, so we'll need at least ceiling(2.528) = 3 parallel ingestion pipelines. In server.conf:

[general]
parallelIngestionPipelines = 3

The default maxQueueSize value for the indexing queue is 500 KB per parallel ingestion pipeline. We shouldn't need to modify it for this example.

This is a good starting point for syslog output over 1 GbE with a RTT of 2 ms. If you know your actual bandwidth and RTT values, we can adjust the settings as needed.

As a side note, the maximum datagram size for UDPv4 is 65,507 bytes. If you're using UDP and the Owl data diode itself doesn't have a smaller limit, you can set maxEventSize directly in $SPLUNK_HOME/etc/apps/owl_diode_sender/local/outputs.conf and forget about it:

[syslog:diode-syslog-udp]
maxEventSize = 65507

If the size of your source events exceed this limit or there are other limits between the sender and the Owl data diode, use TCP and an appropriately configured line breaker on the receiver.

Without access to an Owl data diode, I'm making assumptions about MTUs, packet ordering, etc. that may be invalidated by an actual device.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...