I have a single licensed indexer running on a server.
I also have installed a universal forwarder to collect and send data from another site.
There is a 50Mpbs link between the sites, but I am only seeing about 15-30kBps from the forwarder to the indexer.
How do I make it go faster / why is it going so slowly.
CPU, Memory and network all are fine (CPU is hardly used). I can send data manually to the indexer - I scp'd a file there at 10x the transfer speed I'm seeing.
On the forwarder I have checked the limits.conf and edited this to override the 256kBps limit - I've tried:
maxKBps = 1000000000
maxKBps = 0
I observe in the forwarder metrics.log that the parsingqueue was getting full so I increased it (and the other queues) but it is still getting blocked:
07-17-2012 13:59:01.449 +0100 INFO Metrics - group=queue, name=parsingqueue, blocked=true, maxsizekb=102400, currentsizekb=102399, currentsize=133403, largestsize=133403, smallest_size=133365
07-17-2012 13:59:01.449 +0100 INFO Metrics - group=queue, name=tcpoutmyindexer9997, maxsize=51200000, currentsize=51181970, largestsize=51199992, smallest_size=51174808
Interestingly the tcpout queue seems to be permanently full like this.
Is there anything else limiting the speed? Can the indexer be limiting the speed that the forwarder can send to it?
Any help appreciated.
Have you checked if there is packet loss between forwarder and indexer? Run "netstat -s | grep retransmited" on forwarder. It will give you absolute value so take couple samples when forwarder is sending data and see if value increases.
If you have more than 1 forwarder, are others working better or do they all have similar performance problems?
There is not packet loss. The interface on the forwarder is fine. It is communicating well (1-50Mps) to local LAN.
The link is also fine as I can see by transferring files.
The problem is splunk is not sending the data very fast.
I have got some netflow data showing ~150kbps
Interestingly, after restarting splunk (at both ends) I see the two connections now with the SAME speed as before - so the total bandwidth is now ~300kbps. Looks like something is limiting the speed per connection to ~150kbps
To get the full picture, you'll need to make sure that the forwarder is not being throttled by blocked/saturated queues on the indexer(s). In order to view the fill percentage of the indexer's queues, you can use the "Distributed Indexing Performance" view of the S.o.S app.
I looked in metrics on the indexer and cannot see high queues. I looked at this:
index=internal source=*metrics.log group=queue | timechart avg(currentsize) by name
and can see mostly zeros for the period.
I wonder if splunk may just be really bad at using the bandwidth on a high latency link.
This is ~280ms it's literally the other side of the world.
Are you using useACK=true in outputs.conf ?
Because it fill force the forwarder to wait for acknowledgment from the indexer that the indexing is done. This will slow down considerably the forwarder and indexer.
If tcpout is full, then it is likely queues on the indexer are pushing back to the heavy forwarder. You should check to see which queues are full and blocked on the indexer, as I believe this will have something to do with your problem.
Doesn't appear to be full. There were occasional times where queues got full, but over 24 hours mostly they are low/empty. The network send speed is still low even when indexer queues are low.
I have tried now 2 things:
1. Set up a persistent queue (100MB) on the indexer to show that it wasn't the block.
2. Turn on compression
This seems to fail - but I think because the input is splunktcp and not tcp the persistent queue is not a valid option.
Make sure where you set the maxKBps = 0 is in the forwarder's proper location (the proper app matters):
maxKBps = 0
And restart the forwarder.