Deployment Architecture
Highlighted

UDP Drops on Linux

Contributor

At a site, we are doing 5000-20000 EPS (Events Per Second) all udp/514 at each of 10 locations. At this time, we are required to do this all with a single server/indexer in each location.

This week we made syslog-ng as the listening service (instead of Splunk) since it seemed to work better regarding the other log recipients (see below). Syslog-ng listens on udp/514 and then:

  • Forwards 100% via udp/514 to another direct-connected logging system
  • Forwards a certain "match" of lt 1% to Netcool
  • Writes 100% to a file

Splunk reads the file and seems to be performing well; however, the system reports many UDP errors indicating syslog-ng is not receiving it all. Tests with a traffic-generator (loggen bundled with syslog-ng) show that a fraction of what is sent gets processed by syslog-ng. A tcpdump at the same time shows the EXACT packet count that was sent was received, about 1/3 dropped by the kernel and about 1/3-1/2 of what was sent is what makes it the syslog-ng file.

Applying the tuning tweaks below have reduced the number of drops substantially; however, they have not been eliminated. Maybe it isn't possible in this situation, but I think it should be.

Questions

  • Besides a packet capture, is there anything in RHEL (built-in preferred) that provides a counter of UDP packets attempted and not just those that are received (see comments on netstat -su output below)?
  • Is it possible to receive all packets sent yet still get UDP errors?
  • What is a reasonable amount of syslog EPS to expect with this hardware?
  • Are there any other tweaks to the Kernel or Syslog-ng to make?

Environment

Indexer:

  • RHEL 5.4
  • 2 6core processors – 12 cores
  • 24GB Memory
  • 5TB SAS 15k disk

Data:

  • Syslog (udp/514) from Palo Alto Firewall Data from several direct-connected firewalls (same switch)
  • Messages are 250-350 bytes mostly, with some just over 400 and NONE over 500 bytes.

ERRORS

  • netstat -su shows output like
Udp:
    3436531 packets received
    1553 packets to unknown port received.
    8402 packet receive errors
    109202 packets sent
    RcvbufErrors: 8402
  • The "packets received" is how many were accepted by the listening application and NOT how many were attempted
  • The "packets to unknown port" is what came in for the application when the application was down or not listening (e.g. during a restart)
  • The "packet receive errors" clearly indicates an error; HOWEVER, it is not a one-for-one packets-to-error. You can have many more errors than packets were sent, indicating it is possible for a single packet to generate multiple errors.

Tuning Steps

  • These are similar to what was done; however, multiple values were tried so the numbers below are not exactly what is in production now:
  • We tried turning off the udp/514 forwarding to the other applications but we did not see a noticeable drop in errors
  • kernel
    • net.ipv4.udp_rmem_min = 131072
    • net.ipv4.udp_wmem_min = 131072
    • net.core.netdev_max_backlog=2000
    • net.core.rmem_max=67108864
    syslog-ng
options {
        sync (5000);
        time_reopen (10);
        time_reap(5);
        long_hostnames (off);
        use_dns (no);
        use_fqdn (no);
        create_dirs (no);
        keep_hostname (yes);
        log_fifo_size (536870912);
        stats_freq(60);
        flush_lines(500);
        flush_timeout(10000);
};
Highlighted

Re: UDP Drops on Linux

Path Finder

We have almost the same setup in our network. Here is the settings we finally ended up with that seem to work pretty well. We still get errors from time to time (fact of life with UDP) but it's a lot better then before.

[root@syslog151 etc]# uname -a
 Linux syslog151.xxx.voxeo.net 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@syslog151 etc]#  

[root@syslog151nti etc]# rpm -q -a |grep syslog 
syslog-ng-2.0.9-1.el5
[root@syslog151nti etc]# 

# sysctl.conf
# Change the socket read buffer sizes
net.core.rmem_max = 16777216
net.core.rmem_default = 8388608


syslog-ng.conf:
options {
          flush_lines (1000);
          stats_freq (300);
          time_reap (30);
          log_fifo_size (500000);
          time_reopen (2);
          use_dns (yes);
          dns_cache (yes);
          dns_cache_size(300);
          dns_cache_expire(3600);
          dns_cache_expire_failed(3600);
          use_fqdn (yes);
          create_dirs (yes);
          keep_hostname (yes);
          chain_hostnames (no);
          perm(0444);
          dir_perm(0555);
          group("wheel");
          dir_group("wheel");
        };

View solution in original post

Highlighted

Re: UDP Drops on Linux

Path Finder

btw - note that with these sysctl settings you are changing the default buffer sizes on sockets, meaning if you have other apps that open a ton of sockets you will burn up your kernel memory VERY quickly (8MB per hit).

Highlighted

Re: UDP Drops on Linux

Path Finder

I would also say we are getting more in the range of 20k eps in this setup on much slower hardware. Dual L5410's (2.3Ghz), 8GB ram, dual (raid1) 10k rpm WD Raptor drives

Highlighted

Re: UDP Drops on Linux

Contributor

I didn't expect you could do that volume even with DNS res (I see you are using cache, but still).
Thanks for this, I will check this out in the lab when I get back onsite tomorrow.

Highlighted

Re: UDP Drops on Linux

Splunk Employee
Splunk Employee

You probably don't want to hear this after putting in so much time, but drop syslog-NG and go with rsyslogd. SUSE has done this in their releases, and it is only a matter of time before rsyslogd becomes the de facto standard. Google can show you several performance comparisons that demonstrate rsyslogd outperforming syslog-NG in most relevant benchmarks.

Highlighted

Re: UDP Drops on Linux

Splunk Employee
Splunk Employee

seconded. rsyslog is good stuff.

Highlighted

Re: UDP Drops on Linux

SplunkTrust
SplunkTrust

6 years later, it seems like rsyslog has pretty much taken over (at least in the RPM-based distributions) as the default. But, are the performance differences still present? And, having used both I'm going to confess a lot of hate and rage toward rsyslog's configuration syntax. So, in 2016, what is recommended?

Highlighted

Re: UDP Drops on Linux

Engager

syslog-ng and rsyslog performs on par, one is better in some scenarios, the other is better on others. And udp receive errors do happen if receive buffers are not adjusted. That's a fact and may not relate to the quality of the syslog server in question.

Highlighted

Re: UDP Drops on Linux

Engager

We've run into the same problems with our syslog server. Adding to the puzzle is that (a) we're seeing the same sort of packet loss with both rsyslog and syslog-ng, and (b) we DON'T see the problem using, e.g., "awk {print} /inet/udp/514/0/0".

On a quad-core 2.4Ghz system, we're not able to sustain much more than 50 msgs/sec (using the "loggen" program from syslog-ng) without errors. Attempting the modest rate of 1000 msgs/sec, we're seeing on the order of 70% loss. That's crazy!

I'm going to take a look at the parameters suggested by @zxcgeek, but I'm astounded at such poor performance on a stock system.