At a site, we are doing 5000-20000 EPS (Events Per Second) all udp/514 at each of 10 locations. At this time, we are required to do this all with a single server/indexer in each location.
This week we made syslog-ng as the listening service (instead of Splunk) since it seemed to work better regarding the other log recipients (see below). Syslog-ng listens on udp/514 and then:
Splunk reads the file and seems to be performing well; however, the system reports many UDP errors indicating syslog-ng is not receiving it all. Tests with a traffic-generator (loggen bundled with syslog-ng) show that a fraction of what is sent gets processed by syslog-ng. A tcpdump at the same time shows the EXACT packet count that was sent was received, about 1/3 dropped by the kernel and about 1/3-1/2 of what was sent is what makes it the syslog-ng file.
Applying the tuning tweaks below have reduced the number of drops substantially; however, they have not been eliminated. Maybe it isn't possible in this situation, but I think it should be.
Udp: 3436531 packets received 1553 packets to unknown port received. 8402 packet receive errors 109202 packets sent RcvbufErrors: 8402
options { sync (5000); time_reopen (10); time_reap(5); long_hostnames (off); use_dns (no); use_fqdn (no); create_dirs (no); keep_hostname (yes); log_fifo_size (536870912); stats_freq(60); flush_lines(500); flush_timeout(10000); };
We have almost the same setup in our network. Here is the settings we finally ended up with that seem to work pretty well. We still get errors from time to time (fact of life with UDP) but it's a lot better then before.
[root@syslog151 etc]# uname -a
Linux syslog151.xxx.voxeo.net 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@syslog151 etc]#
[root@syslog151nti etc]# rpm -q -a |grep syslog
syslog-ng-2.0.9-1.el5
[root@syslog151nti etc]#
# sysctl.conf
# Change the socket read buffer sizes
net.core.rmem_max = 16777216
net.core.rmem_default = 8388608
syslog-ng.conf:
options {
flush_lines (1000);
stats_freq (300);
time_reap (30);
log_fifo_size (500000);
time_reopen (2);
use_dns (yes);
dns_cache (yes);
dns_cache_size(300);
dns_cache_expire(3600);
dns_cache_expire_failed(3600);
use_fqdn (yes);
create_dirs (yes);
keep_hostname (yes);
chain_hostnames (no);
perm(0444);
dir_perm(0555);
group("wheel");
dir_group("wheel");
};