Getting Data In

Is it normal behavior for Splunk to block queues and stop forwarding data when one of two remote ports is closed?

Engager

Hello,

I use a Splunk heavy forwarder and I would like to send inputs to a remote a server.

I have two channels on port 5000 and 5001.

When I close the port 5001 for example on my remote server, splunk blocks all queues and stops forwarding data to the port 5000.
Is it normal ?
Is it possible to spool logs on a persistentqueue for the port 5001 and continue to forward logs for the port 5000 without blocking queues on splunk ?

I don't need to index these logs... In my case I use splunk like a syslog-ng server.

inputs.conf:

[default]
host = server1

[tcp://5000]
sourcetype=test1
_TCP_ROUTING=test1
persistentQueueSize = 500 MB
queueSize = 10 KB

[tcp://5001]
sourcetype=test2
_TCP_ROUTING=test2
persistentQueueSize = 500 MB
queueSize = 10 KB

outputs.conf:

[tcpout]
defaultGroup = test1, test2
sendCookedData = false
indexAndForward = false
useACK = true

[tcpout:test1]
server = 192.168.0.36:5000

[tcpout:test2]
server = 192.168.0.36:5001

metrics.log:

07-05-2015 11:33:22.851 +0200 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=2327, largest_size=2327, smallest_size=2327

Thanks for your help,

1 Solution

Influencer

On this topic I highly recommend watching the .conf 2014 talk "How splunkd works" by
Amrit Bath and Jag Kerai. If you're going to .conf 2015 as things currently stand it looks like @amrit and @Jag will reprise their talk from last year, and it looks like there will be a talk on pipeline sets as well this year that sounds like it might help things here, but talks for .conf 2015 are still being decided. There's also a community wiki doc on How Indexing Works that's also a good resource as well. (There are currently some things that are slightly off in some of the diagrams there and they're being updated, but they're close enough)

In any case, yes I believe this is expected behavior since, Splunk under the hood is quite literally a series of tubes. In your case it looks something like this crude ascii art:

tcp:5000 -\ 
           = parsingQ - parsingPipeline - aggQ - ....
tcp:5001 -/              

                            /- tcpout:test1Q -> networkToSplunk:5000
... - indexQ - indexerPipe =
                            \- tcpout:test2Q -> networkToSplunk:5001

Every Pipeline is a single thread of execution, that pulls data off of the previous Queue, processes it, and sends to the next Queue. All of your inputs come in on different ports, but land in the same parsingQueue, and several pipelines and queues later, eventually all go to the same indexerQueue, where the indexing Pipeline will then route the data to different output queues (one per group), where the network piece will attempt to send that data on to the next server in the chain.

Now, by preventing traffic to port 5001 on the server, data stops coming off the queue to send to port 5001. Eventually that queue will fill and block. When the indexerPipeline comes across the next chunk of data that needs to be routed there, it cannot add to the queue, so being a single thread of execution, the indexer pipeline blocks, waiting for the ability to add something to the port 5001 output queue. When it stops processing, it stops taking data from the indexqueue, and that queue similarly eventually fills up, blocks, preventing the typing Pipeline from writing to it, and so on.

As both outputs to port 5000 and 5001 are being handled by the same indexerPipeline, when one stops, the other could eventually stop as well.

Now however, if these different ports were the same cluster, you could have one output group with both as destinations, however this means that any event could wind up either place, which might not be your intention.

View solution in original post

Splunk Employee
Splunk Employee

In a short summary, this is expected behavior. While your default group has two groups, Splunk treats this as one, if you will, pipe. So if one of the connections errors out, it will stall both connections.

If you really need multiple outputs like this, best practices is to install another UF on the box and tcpout there also. Then you wont have any queue related blocking issues if one tcpout is backed up.

0 Karma

Influencer

On this topic I highly recommend watching the .conf 2014 talk "How splunkd works" by
Amrit Bath and Jag Kerai. If you're going to .conf 2015 as things currently stand it looks like @amrit and @Jag will reprise their talk from last year, and it looks like there will be a talk on pipeline sets as well this year that sounds like it might help things here, but talks for .conf 2015 are still being decided. There's also a community wiki doc on How Indexing Works that's also a good resource as well. (There are currently some things that are slightly off in some of the diagrams there and they're being updated, but they're close enough)

In any case, yes I believe this is expected behavior since, Splunk under the hood is quite literally a series of tubes. In your case it looks something like this crude ascii art:

tcp:5000 -\ 
           = parsingQ - parsingPipeline - aggQ - ....
tcp:5001 -/              

                            /- tcpout:test1Q -> networkToSplunk:5000
... - indexQ - indexerPipe =
                            \- tcpout:test2Q -> networkToSplunk:5001

Every Pipeline is a single thread of execution, that pulls data off of the previous Queue, processes it, and sends to the next Queue. All of your inputs come in on different ports, but land in the same parsingQueue, and several pipelines and queues later, eventually all go to the same indexerQueue, where the indexing Pipeline will then route the data to different output queues (one per group), where the network piece will attempt to send that data on to the next server in the chain.

Now, by preventing traffic to port 5001 on the server, data stops coming off the queue to send to port 5001. Eventually that queue will fill and block. When the indexerPipeline comes across the next chunk of data that needs to be routed there, it cannot add to the queue, so being a single thread of execution, the indexer pipeline blocks, waiting for the ability to add something to the port 5001 output queue. When it stops processing, it stops taking data from the indexqueue, and that queue similarly eventually fills up, blocks, preventing the typing Pipeline from writing to it, and so on.

As both outputs to port 5000 and 5001 are being handled by the same indexerPipeline, when one stops, the other could eventually stop as well.

Now however, if these different ports were the same cluster, you could have one output group with both as destinations, however this means that any event could wind up either place, which might not be your intention.

View solution in original post

SplunkTrust
SplunkTrust

The splunky way would be to loadbalance to a cluster that then replicates the data amongst itself rather than cloning at the source.

Engager

Thanks for this nice answer.

Am I the only one to think it's a design mistake of splunk's developper ?

I could expect more from the leader in gartner magic quadrant...

0 Karma

Influencer

Now I don't work for Splunk, but from what I've heard the design rose from balancing having as much modularity and parallelism as possible (no knowledge of prior steps other than the transformations of data along the way), but still enabling people with basic hardware to still have OK performance (minimizing duplication) at small scales out of the box.

If you haven't exhausted your CPU cores and memory, you could always install multiple copies of Splunk on the same box, have them work in parallel on different data and different ports, and you could solve the problem that way. In this manner, you have two completely separate pipelines, and the one with the blocked network port would stop working as described above, and the one with the clear network would continue to flow.

But I don't know about Pipeline Sets, and the possibility of more fully using my available CPU cores with only needing to maintain a single instance of Splunkd is rather exciting, so I'm looking forward to hearing @Jag talk on it at .conf 2015. From the description of the talk it sound like it might apply to this situation as well.

0 Karma