Following on from; http://splunk-base.splunk.com/answers/7001/udp-drops-on-linux
Are any of you showing drops for syslog-ng in /var/log/messages? I believe we are experiencing syslog drops and trying to determine the best way to correct it. It is my understanding that is syslog-ng is dropping logs, than it would show in the /var/log/messages. I do not see any drops, only processes. We are processing on average 25k syslog messages every minute, but sometimes spike to 50k a minute. I do not know, however, if it would show the linux UDP kernel buffer dropping messages. Right now, syslog-ng is configured with default settings and no linux kernel UDP buffer adjustments have been made.
Any assistance would be appreciated.
I appreciate everyone's input. Leveraging netstat -us and noting UDP packet receive errors, and after some trial an error, I was able to find the settings to resolve our issues with syslog drops in our environment. After adjustments were made, I was able to realize that we were dropping approximately 40% of our syslog data at the linux kernel.
Increased the net.core.rmem _ max to 64MB.
net.core.rmem_max = 67108864
Let syslog-ng know it has more input UDP buffer.
source s_network {
udp(ip(0.0.0.0) port(514) so_rcvbuf(67108864));
};
I wanted to update the ticket hopefully as a reference for anyone else that might experience this issue. netstat -us showing UDP receive errors was a key tool in determining that we had an issue as well as benchmarking to determine the best buffer size.
I appreciate everyone's input. Leveraging netstat -us and noting UDP packet receive errors, and after some trial an error, I was able to find the settings to resolve our issues with syslog drops in our environment. After adjustments were made, I was able to realize that we were dropping approximately 40% of our syslog data at the linux kernel.
Increased the net.core.rmem _ max to 64MB.
net.core.rmem_max = 67108864
Let syslog-ng know it has more input UDP buffer.
source s_network {
udp(ip(0.0.0.0) port(514) so_rcvbuf(67108864));
};
I wanted to update the ticket hopefully as a reference for anyone else that might experience this issue. netstat -us showing UDP receive errors was a key tool in determining that we had an issue as well as benchmarking to determine the best buffer size.
Since this accepted answer, we have increased our syslog collection exponentially. We again started to run up against massive UDP receive errors. Tuning receive buffers did not resolve the issue this time. Turns out that disabling use_dns() greatly improved performance. We have not had an issue with it until about 4 months ago when we started rolling out Fortigate UTM's to all of our field offices. With use_dns() disabled, we are back to "near zero" UDP receive errors
Just wanted to provide an update.
Thanks
If running RHEL7:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_...
" To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process:
net.core.rmem_default=
net.core.wmem_default=
net.core.rmem_max=
net.core.wmem_max=
Agreed, that you'll probably get better support in a forum more dedicated to syslog-ng questions. That said, we had similar issues a while back (before upgrading our syslog-ng collector box). When it was running on a dual core Xeon with 4GB of RAM (32bit CentOS 5.8) still, we had good results by changing the following configs. In syslog-ng.conf:
options {
keep_hostname(yes);
flush_lines(200);
log_fetch_limit(200);
log_fifo_size(5000);
time_sleep(20);
stats_freq(120);
};
And then, for the UDP and TCP segments of the "source" blocks, the
so_rcvbuf option needs to be included (as otherwise it has a ridiculous
default of "0" for this):
source s_remote {
tcp(ip(0.0.0.0) port(514) max-connections(1000) keep-alive(yes) so_rcvbuf(16777216));
udp(ip(0.0.0.0) port(514) so_rcvbuf(16777216));
};
I know the so_rcvbuf was a major factor in taking care of the packet drops we were seeing. And the other options (specifically log_fetch_limit, flush_lines and time_sleep) helped bring down the overall CPU usage of syslog-ng dramatically. Also note that the latest syslog-ng has an option to run multi-threaded. We haven't tried this yet, but it should improve performance I'd think.
In addition, if you're running a Linux kernel older than 2.6.18 (CentOS 4.x or 5.x... 6.x doesn't apply), some parameters in /etc/sysctl.conf need to be added/tweaked and then reboot the system:
net.core.rmem_max = 8738000
net.core.wmem_max = 6553600
net.ipv4.tcp_rmem = 8192 873800 8738000
net.ipv4.tcp_wmem = 4096 655360 6553600
That should do it.
Prior to all this, you should check "netstat -us" and "netstat -ts" to get a sense of how bad the packet loss is or isn't. This way you can also gauge the improvement any of these tweaks might make.
Hope it helps.
Cleared netstat counters. After 15 minutes here is the output from netstat -su
[root@HO-SPLUNKFW1 ~]# netstat -su
IcmpMsg:
InType8: 549
OutType0: 549
OutType3: 6
Udp:
851624 packets received
6 packets to unknown port received.
541184 packet receive errors
834 packets sent
IpExt:
InMcastPkts: 16
OutMcastPkts: 17
This probably isn't the right forum for this type of question. I suggest you try a forum more orientated to OS level support vs Splunk.