Deployment Architecture

UDP Drops on Linux

sdwilkerson
Contributor

At a site, we are doing 5000-20000 EPS (Events Per Second) all udp/514 at each of 10 locations. At this time, we are required to do this all with a single server/indexer in each location.

This week we made syslog-ng as the listening service (instead of Splunk) since it seemed to work better regarding the other log recipients (see below). Syslog-ng listens on udp/514 and then:

  • Forwards 100% via udp/514 to another direct-connected logging system
  • Forwards a certain "match" of lt 1% to Netcool
  • Writes 100% to a file

Splunk reads the file and seems to be performing well; however, the system reports many UDP errors indicating syslog-ng is not receiving it all. Tests with a traffic-generator (loggen bundled with syslog-ng) show that a fraction of what is sent gets processed by syslog-ng. A tcpdump at the same time shows the EXACT packet count that was sent was received, about 1/3 dropped by the kernel and about 1/3-1/2 of what was sent is what makes it the syslog-ng file.

Applying the tuning tweaks below have reduced the number of drops substantially; however, they have not been eliminated. Maybe it isn't possible in this situation, but I think it should be.

Questions

  • Besides a packet capture, is there anything in RHEL (built-in preferred) that provides a counter of UDP packets attempted and not just those that are received (see comments on netstat -su output below)?
  • Is it possible to receive all packets sent yet still get UDP errors?
  • What is a reasonable amount of syslog EPS to expect with this hardware?
  • Are there any other tweaks to the Kernel or Syslog-ng to make?

Environment

Indexer:

  • RHEL 5.4
  • 2 6core processors – 12 cores
  • 24GB Memory
  • 5TB SAS 15k disk

Data:

  • Syslog (udp/514) from Palo Alto Firewall Data from several direct-connected firewalls (same switch)
  • Messages are 250-350 bytes mostly, with some just over 400 and NONE over 500 bytes.

ERRORS

  • netstat -su shows output like
Udp:
    3436531 packets received
    1553 packets to unknown port received.
    8402 packet receive errors
    109202 packets sent
    RcvbufErrors: 8402
  • The "packets received" is how many were accepted by the listening application and NOT how many were attempted
  • The "packets to unknown port" is what came in for the application when the application was down or not listening (e.g. during a restart)
  • The "packet receive errors" clearly indicates an error; HOWEVER, it is not a one-for-one packets-to-error. You can have many more errors than packets were sent, indicating it is possible for a single packet to generate multiple errors.

Tuning Steps

  • These are similar to what was done; however, multiple values were tried so the numbers below are not exactly what is in production now:
  • We tried turning off the udp/514 forwarding to the other applications but we did not see a noticeable drop in errors
  • kernel
    • net.ipv4.udp_rmem_min = 131072
    • net.ipv4.udp_wmem_min = 131072
    • net.core.netdev_max_backlog=2000
    • net.core.rmem_max=67108864
    syslog-ng
options {
        sync (5000);
        time_reopen (10);
        time_reap(5);
        long_hostnames (off);
        use_dns (no);
        use_fqdn (no);
        create_dirs (no);
        keep_hostname (yes);
        log_fifo_size (536870912);
        stats_freq(60);
        flush_lines(500);
        flush_timeout(10000);
};
1 Solution

zscgeek
Path Finder

We have almost the same setup in our network. Here is the settings we finally ended up with that seem to work pretty well. We still get errors from time to time (fact of life with UDP) but it's a lot better then before.

[root@syslog151 etc]# uname -a
 Linux syslog151.xxx.voxeo.net 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@syslog151 etc]#  

[root@syslog151nti etc]# rpm -q -a |grep syslog 
syslog-ng-2.0.9-1.el5
[root@syslog151nti etc]# 

# sysctl.conf
# Change the socket read buffer sizes
net.core.rmem_max = 16777216
net.core.rmem_default = 8388608


syslog-ng.conf:
options {
          flush_lines (1000);
          stats_freq (300);
          time_reap (30);
          log_fifo_size (500000);
          time_reopen (2);
          use_dns (yes);
          dns_cache (yes);
          dns_cache_size(300);
          dns_cache_expire(3600);
          dns_cache_expire_failed(3600);
          use_fqdn (yes);
          create_dirs (yes);
          keep_hostname (yes);
          chain_hostnames (no);
          perm(0444);
          dir_perm(0555);
          group("wheel");
          dir_group("wheel");
        };

View solution in original post

bazsi77
Engager

UDP is lossy, but can be tuned to a point to a "good enough" receipt rate. It is always best to use TCP if possible though.

In general you'll need to size the input buffers according to the input rate of messages. You have to pick a volume in bytes/second (as the kernel accounts memory use in bytes, rather than packets), this can be found out by measuring the actual traffic, or by checking the average message size and multiplying it with the desired message rate.

Once you have a number for your bytes/second rate, you'll have to pick a number how much time syslog-ng (or other syslog daemon) will spend on processing your messages (rather than receiving them). In normal cases, the round-trip-time between the reception of incoming messages is on the order of microseconds, sometimes a longer latency might happen. As described in one of the answers, this might be affected by the file system in use, or the processing applied to the incoming log stream.

I generally tend to plan for 3-5 seconds latencies, which means that syslog-ng will spend this time elsewhere, in the worst case.

Once these numbers are established we only need to make sure that our kernel based receive buffer can hold this amount of data.

In my example, let's say we work with 50k messages per second, 300 bytes on average and syslog-ng may suffer 5 second latencies. To offset this, we need a 75MB receive buffer. Today's servers dedicated to syslog processing should be able to allocate this amount of kernel memory to the syslog subsystem and remember, that this is the worst case and not a static allocation.

syslog-ng has a so-rcvbuf() option, but kernel limits also apply that prevent from increasing the receive buffer above a certain threshold. To increase the threshold you'll need to change 'sys.net.core.rmem_max' value:

# echo 75000000 > /proc/sys/net/core/rmem_max

And then use this option for your udp receiver in syslog-ng.conf:

source s_net {
  udp(so-rcvbuf(75000000));
};

If you want to be absolutely sure, you can check syslog-ng with strace to see if it really increased the socket buffer.

In general version 3.3 of syslog-ng (released early 2011, currently at 3.3.6), should have no issue processing 100k messages per second from a single source. When processing messages from a large number of clients, this number can also be scaled to 1M msg/sec.

That should be more than anyone needs, even though certainly this number depends on the configuration used for syslog-ng.

jpvriel
Engager

This response doesn't offer exact answers to the questions posed in the original post. We didn't use syslog-ng. Instead, it shares our approach and experience with overcoming UDP Packet Drops.

My understanding of the architecture when sending syslog messages directly to a Splunk UDP collector (source) is:

Kernal UDP mem limits -> Kernel UDP socket buffer size -> Splunk receive queue (in memory) + Splunk persistent queue (on disk) -> Splunk Indexer/forwarder

In our case, UDP drops occurred due to:

  • Kernel network buffers got filled. We ran Ubuntu Desktop as the base OS, so the default buffer values were tuned for Desktop use...
  • When Splunk creates the UDP socket, the default receive buffer is 1.5MB. So even if your OS kernel config allows for larger buffers, Splunk is setting a smaller UDP buffer for its socket.
  • Logging bursts causing Splunk's receiving queue for UDP sources to be filled if it's too small.
  • The Splunk indexer/forwarder fell behind due to system load, causing the UDP receiving queues to be filled.
  • Our collector and indexer were on the same system and complex searches caused load issues.

We resolved our UDP drops in 5 ways:

  1. Increasing Linux kernel network buffers (similar to posts above)
  2. Increasing splunk's queue size for udp sources in input.conf
  3. Enabling splunk's persistent queue functionality in input.conf
  4. Re-prioritising the Splunk collector process, which helped deal with load caused by searches
  5. Turning off unnecessary/redundant messages at source, or failing that, apply a filter on the source feed so that the indexer wasn't burdened with extra data.

(1) Increasing Linux kernel network buffers

Increase the UDP receive buffer size from 128K to 32MB

sysctl -w net.core.rmem_max=33554432

Increase other memory management options which moderate how much memory UDP buffers use. E.g. default on system with 2GB:

  • net.ipv4.udp_mem = 192576 256768 385152
  • net.ipv4.udp_rmem_min = 4096

    sysctl -w net.ipv4.udp_mem='262144 327680 393216'

Note, net.ipv4.udp_mem works in pages, so multiply values by PAGE_SIZE, where PAGE_SIZE = 4096 (4K). Max udp_mem is then 385152 * 4096 = 1,577,582,592

Increase the queue size for incoming packets. On Ubuntu, the default appeared to be 1000.

sysctl -w net.core.netdev_max_backlog=2000

As pre the posts above, make the change persistent using sysctl.conf

(2) Increasing Splunk's queue size for UDP sources & (3) Enabling splunk's persistent queue functionality

Example from inputs.conf

[udp://9000]
connection_host = ip
index = main
# _rcvbuf defaults to 1572864 (1.5MB)
_rcvbuf = 16777216
# queueSize = defaults to 500K
queueSize = 16MB
# persistentQueueSize defaults to 0 (no persistent queue)
persistentQueueSize = 128MB

(4) Re-prioritising the Splunk collector process

We increase the processing and IO priorities for the 'collector' process, so that searches would not impact data collection. Typically, a high volume Splunk deployment would not have this issue as a separate search head would be used.

Here's a little bash script I wrote to do the work:

# get PID of splunk collector process
splunkd_pid="$(head -n 1 /home/splunk/splunk/var/run/splunk/splunkd.pid)"
echo "Splunk collector PID is $splunkd_pid"

# show current processes and listening ports
echo "Splunk ports:"
netstat -nlp4 | grep splunk
echo

# re-nice process to higher priority
splunkd_was_nice="$(ps -p $splunkd_pid --no-headers -o nice)"
splunkd_was_nice="$(expr match "$splunkd_was_nice" '[ ]*\([-0-9]*\)')"
if renice -n -5 -p $splunkd_pid; then
  echo "Changed splunkd CPU priority from nice $splunkd_was_nice to -5 (meaner)"
else
  echo "ERROR: failed to renice process" 1>&2
  exit 1
fi

# re-nice IO priority
splunkd_io_was_nice="$(ionice -p $splunkd_pid)"
splunkd_io_was_nice_class="$(expr match "$splunkd_io_was_nice" '^\([a-zA-Z-]*\):')"
splunkd_io_was_nice_pri="$(expr match "$splunkd_io_was_nice" '.*: prio \([0-9]\)$')"
if ionice -c2 -n1 -p $splunkd_pid; then 
  echo "Changed splunkd IO class from $splunkd_io_was_nice_class to best-effort"
  echo "Changed splunkd IO priority from $splunkd_io_was_nice_pri to 1"
else
  echo "ERROR: failed to renice IO for process" 1>&2
  exit 1
fi

echo
echo "Splunk collector prioritisation complete"
echo

(5) Apply a filter to the source to remove unwanted input

In our case, we wanted to filter out specific info level messages, but not all info level messages - the administrator's were unable to selectively control which info level messages were sent. Our example is provided below.

Specify a transform to enable filtering in props.conf

 [udp:9000]
 TRANSFORMS-filter=filter_dp

Specify the filter in transforms.conf

[filter_dp]
REGEX=DataPower (\[[^\]]+\]){2}\[info].+?: rule \(
DEST_KEY = queue
FORMAT = nullQueue

Monitoring for UDP errors

We created a cron job to append the output of netstat to a file, every 5 minutes. netstat command:

 crontab -l
 # m h  dom mon dow   command
 */5 * * * * netstat -su >> /tmp/udp_status.txt

We then indexed /tmp/udp_status.txt and created a view to graph the errors.
The delta command came in handy to compare time intervals.

To timechart the number of UDP packets received:

 sourcetype=udp_stat | kv pairdelim="\n", kvdelim=": ", auto=f | rex "(?<p_rec>\w+)\spackets received" | delta p_rec as p_rec_delta | eval p_rec_delta=abs(p_rec_delta) | timechart span=5m sum(p_rec_delta)

To timechart the number of UDP packets dropped due buffer errors (every 5 minutes):

sourcetype=udp_stat | kv pairdelim="\n", kvdelim=": ", auto=f | rex "RcvbufErrors: (?<rec_buf_err>\d+)" | delta rec_buf_err as rec_buf_err_delta | eval rec_buf_err_delta=abs(rec_buf_err_delta) | timechart span=5m sum(rec_buf_err_delta)

To timechart the number of UDP packets discarded due to the application not taking them from the buffer:

udp_stat | kv pairdelim="\n", kvdelim=": ", auto=f | rex "(?<rec_app_err>\d+) packets to unknown port received." | delta rec_app_err as rec_app_err_delta | eval rec_app_err_delta=abs(rec_app_err_delta) | timechart span=5m sum(rec_app_err_delta)

To tabulate delta's for UDP stats:

sourcetype=udp_stat | kv pairdelim="\n", kvdelim=": ", auto=f | rex "(?<recieved>\w+)\spackets received" | rex "RcvbufErrors: (?<rec_buf_err>\d+)" | rex "(?<rec_app_err>\d+) packets to unknown port received."  | delta recieved as recieved_delta | delta rec_buf_err as rec_buf_err_delta | delta rec_app_err as rec_app_err_delta | eval recieved_delta=abs(recieved_delta) | eval rec_buf_err_delta=abs(rec_buf_err_delta) | eval rec_app_err_delta=abs(rec_app_err_delta) | eval rec_buf_err_percent=round((rec_buf_err_delta / recieved_delta * 100),2) | eval rec_app_err_percent=round((rec_app_err_delta / recieved_delta * 100),2) | table _time recieved recieved_delta rec_buf_err rec_buf_err_delta rec_buf_err_percent rec_app_err rec_app_err_delta rec_app_err_percent

Lowell
Super Champion

nextgen, I've updated the example searches above to resolve the regex issue.

0 Karma

eclypse
Engager

In the above answer from jpvriel, < and > got transposed into (ampersand)lt; and (ampersand)gt;. For example, here's what the first one should look like when you enter them in Splunk:

sourcetype=udp_stat | kv pairdelim="\n", kvdelim=": ", auto=f | rex "(?<p_rec>\w+)\spackets received" | delta p_rec as p_rec_delta | eval p_rec_delta=abs(p_rec_delta) | timechart span=5m sum(p_rec_delta)

nextgen
New Member

I get the following error .... any insights ? terminator seems to be there ?

Error in 'rex' command: Encountered the following error while compiling the regex '(?<p_rec>\w+)\spackets received': Regex: syntax error in subpattern name (missing terminator)

0 Karma

balbano
Contributor

I have been having the same issue.

One of 2 log servers has been dropping UDP packets like crazy. However the other one was fine.

While the changes that were mentioned in here did improve the situation, the drop rate was at a significant level where it was ridiculous.

After banging my head over it for why one was dropping and the other one wasnt dropping, I realized a key difference in the log servers: The working server was having the logs write to an ext3 partition and the server dropping logs was writing to an ext4 filesystem.

As a test, I moved the log destination to an ext3 filesystem with default settings and now its working fine.

Now the question, what are the appropriate ext4 settings for receiving syslog-ng / rsyslog data?

This is what I currently have setup (which is causing the UDP Droppage):

/dev/mapper/ld0p1 /$my_log_dest ext4 noatime,data=writeback,defaults,acl 1 2

I suspect its possibly my journaling option "data=writeback" but I'm not for certain.

Can someone give some insight on this?

Thanks.

Brian

piebob
Motivator

i recommend you ask this additional question separately. 🙂

harvard_lars
Engager

We've run into the same problems with our syslog server. Adding to the puzzle is that (a) we're seeing the same sort of packet loss with both rsyslog and syslog-ng, and (b) we DON'T see the problem using, e.g., "awk {print} /inet/udp/514/0/0".

On a quad-core 2.4Ghz system, we're not able to sustain much more than 50 msgs/sec (using the "loggen" program from syslog-ng) without errors. Attempting the modest rate of 1000 msgs/sec, we're seeing on the order of 70% loss. That's crazy!

I'm going to take a look at the parameters suggested by @zxcgeek, but I'm astounded at such poor performance on a stock system.

araitz
Splunk Employee
Splunk Employee

This may sound dumb, but check to make sure that speed and duplex are manually set at both the switch port and the network adapter. If that fails to reveal anything, I would still recommend investigating the link layer on up, perhaps through a packet capture.

araitz
Splunk Employee
Splunk Employee

You probably don't want to hear this after putting in so much time, but drop syslog-NG and go with rsyslogd. SUSE has done this in their releases, and it is only a matter of time before rsyslogd becomes the de facto standard. Google can show you several performance comparisons that demonstrate rsyslogd outperforming syslog-NG in most relevant benchmarks.

bazsi77
Engager

syslog-ng and rsyslog performs on par, one is better in some scenarios, the other is better on others. And udp receive errors do happen if receive buffers are not adjusted. That's a fact and may not relate to the quality of the syslog server in question.

jrodman
Splunk Employee
Splunk Employee

seconded. rsyslog is good stuff.

dwaddle
SplunkTrust
SplunkTrust

6 years later, it seems like rsyslog has pretty much taken over (at least in the RPM-based distributions) as the default. But, are the performance differences still present? And, having used both I'm going to confess a lot of hate and rage toward rsyslog's configuration syntax. So, in 2016, what is recommended?

zscgeek
Path Finder

We have almost the same setup in our network. Here is the settings we finally ended up with that seem to work pretty well. We still get errors from time to time (fact of life with UDP) but it's a lot better then before.

[root@syslog151 etc]# uname -a
 Linux syslog151.xxx.voxeo.net 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@syslog151 etc]#  

[root@syslog151nti etc]# rpm -q -a |grep syslog 
syslog-ng-2.0.9-1.el5
[root@syslog151nti etc]# 

# sysctl.conf
# Change the socket read buffer sizes
net.core.rmem_max = 16777216
net.core.rmem_default = 8388608


syslog-ng.conf:
options {
          flush_lines (1000);
          stats_freq (300);
          time_reap (30);
          log_fifo_size (500000);
          time_reopen (2);
          use_dns (yes);
          dns_cache (yes);
          dns_cache_size(300);
          dns_cache_expire(3600);
          dns_cache_expire_failed(3600);
          use_fqdn (yes);
          create_dirs (yes);
          keep_hostname (yes);
          chain_hostnames (no);
          perm(0444);
          dir_perm(0555);
          group("wheel");
          dir_group("wheel");
        };

View solution in original post

sdwilkerson
Contributor

I didn't expect you could do that volume even with DNS res (I see you are using cache, but still).
Thanks for this, I will check this out in the lab when I get back onsite tomorrow.

zscgeek
Path Finder

I would also say we are getting more in the range of 20k eps in this setup on much slower hardware. Dual L5410's (2.3Ghz), 8GB ram, dual (raid1) 10k rpm WD Raptor drives

zscgeek
Path Finder

btw - note that with these sysctl settings you are changing the default buffer sizes on sockets, meaning if you have other apps that open a ton of sockets you will burn up your kernel memory VERY quickly (8MB per hit).

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!