Getting Data In

How can I send logs using forwarded to Splunk at the fastest speed possible?

alexlit
Explorer

Hello,

I have a Linux box which has 10 Gb interface.
Is there any way, I can send logs without throttling them at the fastest rate possible?
I have about 200G of logs.

Thanks,
Alex

0 Karma
1 Solution

s2_splunk
Splunk Employee
Splunk Employee

If you are talking about using a Universal Forwarder to forward the data from your Linux box, you need to disable the default throughput limit of 256KBps in limits.conf. Create a limits.conf file in /opt/splunkforwarder/etc/system/local and enter this:

[thruput] 
maxKBps = 0  

'0' means no throughput limitation. The forwarder will send data as fast as it can read it from the source(s) and get it out to the indexers. If you are using a Universal Forwarder version 6.3.x, you can increase throughput by configuring multiple pipeline sets.
Details for that are here
This will cause the forwarder to process multiple input and output streams in parallel, effectively making a single forwarder behave as if it were multiple instances.

View solution in original post

s2_splunk
Splunk Employee
Splunk Employee

If you are talking about using a Universal Forwarder to forward the data from your Linux box, you need to disable the default throughput limit of 256KBps in limits.conf. Create a limits.conf file in /opt/splunkforwarder/etc/system/local and enter this:

[thruput] 
maxKBps = 0  

'0' means no throughput limitation. The forwarder will send data as fast as it can read it from the source(s) and get it out to the indexers. If you are using a Universal Forwarder version 6.3.x, you can increase throughput by configuring multiple pipeline sets.
Details for that are here
This will cause the forwarder to process multiple input and output streams in parallel, effectively making a single forwarder behave as if it were multiple instances.

View solution in original post

alexlit
Explorer

Hi ssievert!
Thanks,
Few more questions:
1.What is the maximum pipelines I can have?
and to configure that pipeline, I just need to set parallelIngestionPipelines=2 in /opt/splunkforwarder/etc/system/local/server.conf
2. Also, what about "maxQueueSize" ? where is that needs to be set? Do I need to set this one too?
What I am trying to do is to test my linux box in term of CPU when I have the worst possible settings. Meanning when the forwarder is forwarding as much as possible. I would like to see how my box will perform. Like I said I have > 200Gb.
thank you,

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

There is no product-enforced limit that I am aware of, but you should be careful about setting it too high. There are multiple consequences with respect to resource usage, both on the forwarder itself as well as the indexer(s). Besides memory and CPU usage, it also affects the number of TCP connections established to your indexing tier (2 per processing pipeline) and you may overwhelm your indexers, if you don't have enough indexing capacity to process an non-throttled event stream.
I would not mess with the default queue size settings.

Not sure what you are trying to achieve with your test, since maxing out forwarder resource usage will impact resources available to the workloads you really want to run on the server. The whole design point of the UF is to provide a collection mechanism that has the least possible amount of resource overhead. I would rather look for signs of slowdowns in the end-to-end event processing first by comparing event timestamps to index_time timestamps as discussed (for example) here

Also, what does it mean when you say you have >200GB? 200GB of log data per day?

In any case, watch your splunkd.log during the test for messages that indicate blocked queues, which will tell you when event processing is starting to choke. Also, watch your indexer event processing queues during your test and you will see whether your indexers can keep up with an non-throttled event stream.
Your event processing speed will be constrained by the slowest piece of the pipeline, which in most (if not all) cases will be your indexer's storage subsystem.

0 Karma

alexlit
Explorer

Thanks,
So, you do not recommend changing queu size? By the way, where would I change it?

Also, yes I have 200GB log data per day, in var/log dirrectory.
Huge amount of logs including .gz

Thanks,
Alex

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Yes, I would not adjust queue sizes unless you thoroughly understand what you are doing. If events are flowing 'freely', default queue sizes should be all you need. The various queues are configured in server.conf.

If a large number of your log source files are .gz, you'll probably see CPU utilization increase noticeably as you increase parallel processing, since the ArchiveProcessor will unzip each file (CPU-bound workload).
FYI, on an unconstrained forwarder you should see data being forwarded at a rate of at least 10MB/s, likely more if your indexers keep up. This would put you in the TB/day range, so if everything is configured properly, 200GB/day is not going to be an issue.

0 Karma

alexlit
Explorer

Thanks,
I would be interested to set up so that I can forward 200GB/day/
Do you knoe how I can configure correctly?

Thank you
Alex

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

If you set maxKBps=0 as outlined earlier, your forwarder should do 200GB/day. IF your indexers can keep up.

200G/day is around 2.5MB/sec. If you don't see that happen, you can try parallel pipelines (start with 2). If you still don't see it happen, you have bottlenecks elsewhere.

0 Karma

alexlit
Explorer

Thank you! I will try that.
My indexers are running on Windows Server machine.
How can tell if my indexers are keeping up??

thanks,
Alex

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

"How can tell if my indexers are keeping up??"

You use either the Distributed Management Console (recommended for 6.x) to look at your indexing performance, or you install the Splunk on Splunk App and use the dashboards in there to look at your processing queues.

If your indexers cannot keep up, you will ultimately see messages in your forwarder splunkd.log that indicate blocked processing queues.
I recommend reading up here

alexlit
Explorer

Thank you very much!

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

You are welcome. Please accept my answer, so this thread shows as "answered".
Thanks!

0 Karma

alexlit
Explorer

Hi ssievert,
Got a question
I am looking at this log:
12-10-2015 18:02:53.587 +0000 INFO Metrics - group=thruput, name=thruput, instantaneous_kbps=2670.719671, instantaneous_eps=302.548709, average_kbps=2681.834249, total_k_processed=3573461.000000, kb=82792.221680, ev=9379.000000, load_average=0.230000

is that the value of my throuput ?
Thanks
Alex

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

From a specific host, yes. Look at the host field associated with the event to see which host is reporting this.

0 Karma