Getting Data In

Why does my Splunk forwarder transmit incomplete events for my rsyslog server?

tboutry
Explorer

Hi folks,

I have a problem with Splunk forwarder on my centralize rsyslog server, exactly it's with the maillog events, I have few incomplete lines!
I know why but I don't know how to fix it.

A quick overview of my infrastructure: All Linux servers send logs to my rsyslog server using the syslog protocol, my configuration on the rsyslog server:

$DynaFileCacheSize 100

$EscapeControlCharactersOnReceive on

$template PerHostFacility,"/var/log/syslog/%fromhost%/%fromhost%-%$year%%$month%%$day%-%syslogfacility-text%.log"

*.*                     ?PerHostFacility

I can see the problem when I use the command tail -f on log file, data is transmitted by blocks but the last line is cut in half. The second half is sent with the next block of data.

$ tail -f /var/log/syslog/cpt-smtp03l-p/cpt-smtp03l-p.domaine.com-20170206-mail.log
[ ... OUTPUT... ]
Feb  6 14:44:19 cpt-smtp03l-p postfix/smtp[9188]: EBD7C601DB: to=<an_email_addr@consultantsbch.com>, relay=remote.groupecdl.ca[24.xx.xxx.74]:25, delay=2.9, delays=0.12/0/0.1
2/2.7, dsn=2.6.0, status=sent (250 2.6.0 <1567a59e-3f5f-4cc6-97f8-b772a7d087da@SERVEUR.cdl.local> Queued mail for delivery)
Feb  6 14:44:19 cpt-smtp03l-p postfix/smtp[9180]: 69F33601D8: to=<a_user_mail_adrr@consultantsbch[END OF FIRST RECEIVED].com>, relay=remote.groupecdl.ca[24.xx.xxx.74]:25, delay=3.4, delays=0.07/0/0.3/3.1, dsn=2.6.0, status=sent (250 2.6.0 <ae1861d1-43dc-491d-917e-9fb041d23dc8@SERVEUR.cdl.local> Queued mail for delivery)

So at the line was write at the first time :

- Feb  6 14:44:19 cpt-smtp03l-p postfix/smtp[9180]: 69F33601D8: to=, relay=remote.groupecdl.ca[24.xx.xxx.74]:25, delay=3.4, delays=0.07/0/0.3/3.1, dsn=2.6.0, status=sent (250 2.6.0  Queued mail for delivery)

but in Splunk I saw the incomplete line :

splunk-view

Here my configuration for the input definition :

[monitor:///var/log/syslog]
blacklist = .*\.(tar|zip|gz)$
disabled = false
host_segment = 4
ignoreOlderThan = 2d
index = ti-mail
initCrcLength = 512
#sourcetype = smtp-log
sourcetype = postfix_syslog
#multiline_event_extra_waittime = true
whitelist = ^.*smtp.*mail\.log.*$

I tried many things like LINE_MERGE and LINE_BREAKER but I'm actually lost, nothing worked. I don't need real time file tracking so if I can tell to Splunk don't read the last "incomplete" line it's good enough, it can read it at the second time ...

Do you have anyway idea? Do you need more data I forgot to provide?

Thanks for you help

1 Solution

jtacy
Builder

What version of rsyslog are you using and can you share more of its configuration? I can't imagine that it's expected behavior to stop writing out the buffer in the middle of a line. You might be able to use time_before_close in your input stanza to try to work around this but the default value of 3 seconds should really be fine.

time_before_close = <integer>
* Modification time delta required before the file monitor can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 3.

Source: http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Inputsconf

What might be happening is that it's taking longer than 3 seconds for rsyslog to get around to completing the line, and meanwhile Splunk has hit EOF, closed the file, and ended the event. I don't know what a realistic maximum value is before you start running into problems with breaking your output load balancing or having other problems, but I think the real fix might be on the rsyslog side; it would be great to get more details about the config.

View solution in original post

dmrzzz
Explorer

I just solved my own variation of this problem...

Key discovery: empirically, at least in v6.5.0, the helpful behavior of time_before_close applies only to the tailing processor and not to the batch reader. (see also https://answers.splunk.com/answers/109779/when-is-the-batchreader-used-and-when-is-the-tailingproces...)

During peak periods, one particular log file of mine grows so quickly that my UF would decide to read it using the batch reader, and each hour I saw hundreds of log messages from that file split into two Splunk events each, along with contemporaneous "INFO TailReader - Batch input finished reading file" messages in splunkd.log.

I don't want rsyslog working extra hard to write individual syslog messages to the filesystem atomically, so my solution was to stop using the batch reader:

[default]
# Set very high so we never use the batch reader.
min_batch_size_bytes = 1000000000000

[inputproc]
# Set higher than the number of simultaneously active log files to
# avoid splitting events.
max_fd = 10000
0 Karma

abeeber_2
Path Finder

I had the same problem and implemented the same configs in Splunk.

But I also found the following and will implement pieces of it in my rsyslog configs..

load required modules

module(load="imudp" threads="2"
timeRequery="8" batchSize="128")

module(load="imptcp" threads="3")

listeners
repeat blocks if more listeners are needed
alternatively, use array syntax:
port=["514","515",...]

input(type="imudp" port="514"
ruleset="writeRemoteData")
input(type="imptcp" port="10514"
ruleset="writeRemoteData")

now define our ruleset, which also includes
threading and queue parameters.
ruleset(name="writeRemoteData"
queue.type="fixedArray"
queue.size="250000"
queue.dequeueBatchSize="4096"
queue.workerThreads="4"
queue.workerThreadMinimumMessages="60000"

) {
action(type="omfile" file="/var/log/remote.log"
ioBufferSize="64k" flushOnTXEnd="off"
asyncWriting="on"
)
}

0 Karma

mydog8it
Builder

@abeeber_2 - What was the outcome of your rsyslog changes? I am having he same issue, at a 240 second delay my "line breaking" issues disappear, but the data is old at that point.

0 Karma

jtacy
Builder

What version of rsyslog are you using and can you share more of its configuration? I can't imagine that it's expected behavior to stop writing out the buffer in the middle of a line. You might be able to use time_before_close in your input stanza to try to work around this but the default value of 3 seconds should really be fine.

time_before_close = <integer>
* Modification time delta required before the file monitor can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 3.

Source: http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Inputsconf

What might be happening is that it's taking longer than 3 seconds for rsyslog to get around to completing the line, and meanwhile Splunk has hit EOF, closed the file, and ended the event. I don't know what a realistic maximum value is before you start running into problems with breaking your output load balancing or having other problems, but I think the real fix might be on the rsyslog side; it would be great to get more details about the config.

tboutry
Explorer

First thanks for your answer ,

Yesterday I found the option time_before_close I tried :

  • 60 seconds : same errors
  • 120 seconds : same errors , less often
  • 300 seconds : well that's not cool , but I don't need real-time logs for now.

But I'm ok with you I'm pretty sure the problem must be with rsyslog , see below my configuration :

$ cat rsyslog.conf | grep -v ^# | grep -v ^$
$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imklog   # provides kernel logging support (previously done by rklogd)
$ModLoad imudp
$UDPServerRun 514
$ModLoad imtcp
$InputTCPServerRun 514
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$ActionFileEnableSync on
$IncludeConfig /etc/rsyslog.d/*.conf
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0644
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
$CreateDirs on
$DirOwner syslog
$DirGroup adm
$FailOnChownFailure off
$DynaFileCacheSize 100
$EscapeControlCharactersOnReceive on
$template PerHostFacility,"/var/log/syslog/%fromhost%/%fromhost%-%$year%%$month%%$day%-%syslogfacility-text%.log"
*.*                     ?PerHostFacility

And include files :

$ cat listen.conf  | grep -v ^# | grep -v ^$ 
$SystemLogSocketName /run/systemd/journal/syslog

I tried to keep it simple , but maybe I miss something

Thanks again for you HELP 😄

0 Karma

jtacy
Builder

Glad you found a solution! Unfortunately, I wasn't able to find a way to reproduce the incomplete lines on rsyslog 7.4.7 (shipped with RHEL7) with your configuration. I did notice that you have $ActionFileEnableSync on in the config, though. I would strongly recommend reviewing the use of this option since it will hammer your storage with write I/Os and limit the rate of events you can handle. Consider http://www.rsyslog.com/doc/master/compatibility/v3compatibility.html#output-file-syncing for more info. I couldn't get more than about 2,500 events per second with your config but was able to run 100,000 events per second (and probably much more) just by changing this option. Have fun!

tboutry
Explorer

Hi,

You Rock !!!
Time_before_close was a "good" work around but at 300 seconds it's a little bit heavy 😛 .
I change my rsyslog's configuration I switched to off ActionFileEnableSync off , now the log file is ok , lines are complete when rsyslog write the file .

THANKS a lot for you help !!!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...