We are using splunk to collect logs from a java-based application. Our logging configuration is as follows:
java app uses a syslog appender, configured to log to udp:localhost:514. Splunk forwarder is installed on the app server, with a udp:514 listener (overridden to a log4j sourcetype). splunkforwarder then forwards to our splunk indexer using standard forwarding (tcp:9997) across a network. The system has worked well for us until now, as unrealiable UDP was only used over a local interface, so the chance of loosing packets is minimal. Until now: we have added some additional logging into our application, and have found that events logged from our application are getting truncated. The pattern seems to be to truncate at a 64k mark. Here's some of the relevant configs in splunk:
inputs.conf
[udp://514]
connection_host = none
sourcetype = log4j
_rcvbuf = 3145728
props.conf
[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 1000
TRUNCATE = 100000
DATETIME_CONFIG = /etc/datetime.xml
ANNOTATE_PUNCT = True
HEADER_MODE =
MAX_DAYS_HENCE=2
MAX_DAYS_AGO=2000
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_TIMESTAMP_LOOKAHEAD = 128
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
MAX_EVENTS = 7000
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
TRANSFORMS =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
LEARN_SOURCETYPE = false
maxDist = 100
[log4j]
MAX_EVENTS = 7000
SHOULD_LINEMERGE = true
TRUNCATE = 100000
Where should we be looking for the truncation?
Can you try logging to a TCP input? I imagine you are are hitting MTU for those UDP log packets. In our environment, we log to syslog-ng and then index those logs, taking the host off the segment on the path. The results are more predictable than using a Splunk input. Hope this helps.
Can you try logging to a TCP input? I imagine you are are hitting MTU for those UDP log packets. In our environment, we log to syslog-ng and then index those logs, taking the host off the segment on the path. The results are more predictable than using a Splunk input. Hope this helps.
i tried with netcat, and found that we were only getting 1024 bytes, so definitely seems to be related to network buffers / units. ifconfig lo
gives lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
- that exactly matches the size of the logs, so I guess there are 2 approaches: 1 - override localhost MTU. 2 - configure the application to log these events to file and add a file monitor to splunk. The logging framework doesn't offer a TCP syslog, and the TCP logging option ("socket appender") uses escape sequences which come through to splunk as literal characters.
some other methods we have used to try debug include: turning off the splunk forwarder and using nc -l -u localhost 514 | tee manual_log.log
to create a UDP listener to log to file, but netcat seems to hang when we get to the big logs... have also tried saving an example of a truncated log to file and piping it into the forwarder: nc -u localhost 514 < biglogentry
, but that doesn't show in the indexer.
a note on the above config, I have just added in the rcvbuf parameter now to ensure it's not a buffering issue, but it's not.