Getting Data In

truncation issue: identifying where it's happening.

brettcave
Builder

We are using splunk to collect logs from a java-based application. Our logging configuration is as follows:

java app uses a syslog appender, configured to log to udp:localhost:514. Splunk forwarder is installed on the app server, with a udp:514 listener (overridden to a log4j sourcetype). splunkforwarder then forwards to our splunk indexer using standard forwarding (tcp:9997) across a network. The system has worked well for us until now, as unrealiable UDP was only used over a local interface, so the chance of loosing packets is minimal. Until now: we have added some additional logging into our application, and have found that events logged from our application are getting truncated. The pattern seems to be to truncate at a 64k mark. Here's some of the relevant configs in splunk:

inputs.conf

[udp://514]
connection_host = none
sourcetype = log4j
_rcvbuf = 3145728

props.conf

[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 1000
TRUNCATE = 100000
DATETIME_CONFIG = /etc/datetime.xml
ANNOTATE_PUNCT = True
HEADER_MODE =
MAX_DAYS_HENCE=2
MAX_DAYS_AGO=2000
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_TIMESTAMP_LOOKAHEAD = 128
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
MAX_EVENTS = 7000
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
TRANSFORMS =
SEGMENTATION          = indexing
SEGMENTATION-all      = full
SEGMENTATION-inner    = inner
SEGMENTATION-outer    = outer
SEGMENTATION-raw      = none
SEGMENTATION-standard = standard
LEARN_SOURCETYPE      = false
maxDist = 100

[log4j]
MAX_EVENTS = 7000
SHOULD_LINEMERGE = true
TRUNCATE = 100000

Where should we be looking for the truncation?

Tags (2)
1 Solution

fervin
Path Finder

Can you try logging to a TCP input? I imagine you are are hitting MTU for those UDP log packets. In our environment, we log to syslog-ng and then index those logs, taking the host off the segment on the path. The results are more predictable than using a Splunk input. Hope this helps.

View solution in original post

fervin
Path Finder

Can you try logging to a TCP input? I imagine you are are hitting MTU for those UDP log packets. In our environment, we log to syslog-ng and then index those logs, taking the host off the segment on the path. The results are more predictable than using a Splunk input. Hope this helps.

brettcave
Builder

i tried with netcat, and found that we were only getting 1024 bytes, so definitely seems to be related to network buffers / units. ifconfig lo gives lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 - that exactly matches the size of the logs, so I guess there are 2 approaches: 1 - override localhost MTU. 2 - configure the application to log these events to file and add a file monitor to splunk. The logging framework doesn't offer a TCP syslog, and the TCP logging option ("socket appender") uses escape sequences which come through to splunk as literal characters.

0 Karma

brettcave
Builder

some other methods we have used to try debug include: turning off the splunk forwarder and using nc -l -u localhost 514 | tee manual_log.log to create a UDP listener to log to file, but netcat seems to hang when we get to the big logs... have also tried saving an example of a truncated log to file and piping it into the forwarder: nc -u localhost 514 < biglogentry, but that doesn't show in the indexer.

0 Karma

brettcave
Builder

a note on the above config, I have just added in the rcvbuf parameter now to ensure it's not a buffering issue, but it's not.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...