Getting Data In

Is there a recommended configuration for syslog-ng log rotation and blacklist to prevent duplicate data?

mmartin0926
New Member

Hello All,

We have a Splunk server setup for monitoring our Cisco WSA server using "Cisco Web Security Advanced Reporting" add-on, which is currently the only source sending files to this Splunk server.

The Splunk server has been filled to capacity and the partition where we store its logs is at 100%. So it seems like Like Rotation was never setup.

I read the info at this link below, but I now have a few questions regarding it.
----> http://docs.splunk.com/Documentation/Splunk/4.1.7/Admin/Howlogfilerotationishandled

Since Splunk does not have a built-in log rotation method, I assume we use the native Linux File rotation method on the server (*syslog-ng I believe..??) ? Is that correct?

# splunk --version
Splunk 6.2.2 (build 255606)
#
# syslog-ng --version
syslog-ng 2.0.9
# cat /etc/*release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

And I also read you can either Blacklist the compressed file format outputted from the log rotation or you can move the files to a new directory to prevent duplicate data from being produced.

blacklist = \.(gz|bz2|z|zip)$ 

What config file do I add the Blacklist configuration option to?
Also, what should I configure the syslog-ng to for the log rotation, is there a recommended configuration for this?

Thanks in Advance,
Matt

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Syslog-ng is a program to read syslog from network and write it to disk. This just grabs what's sent to it on (default) udp/tcp 514 and creates a file on the local system with those contents.

Logrotate is a system utility that, as per its setup, will do the rotation for you.

There is no configuration recommendation for syslog-ng or logrotate, because there's no one size fits all strategy. Your retention of raw logs is generally driven by how far back you may need to go if you hadn't noticed something broke and will be configured mostly with logrotate.

That being said, if you have no other requirements I'd set it to 7 days and have them rotated nightly. Here's what appears to be some nice samples or examples.

My syslog-ng is set up to write files at /var/log/remote/"hostname"/log.txt
My Splunk (A universal forwarder, in this case) is set to only read log.txt in that file monitor stanza, and use the 4th segment as the hostname.
Logrotate is set to rotate log.txt daily, creating log1.txt, then gzip the remaining older ones and delete them when 7 days old. So I have log.txt (monitored by Splunk), log1.txt, log2.txt.gz, log3.txt.gz and so on. So, if I happen to have a broken (or non-started) UF or input, I have 7 days of stuff on disk I can puzzle around to backfill the missing data in Splunk.

I can supply configs for most of this later if you need them, but finding samples on the internet that match what you need is usually pretty easy once you know what to look for, and hopefully I just gave you that information!

View solution in original post

yannK
Splunk Employee
Splunk Employee

My 2 cents, if the log rotation is clean ( I mean not using a logtruncate option that may cause duplicates), then it's not a problem to have splunk monitor the files and the rotated files.
As splunk has a mechanism to read the first lines of a file and detect if it's a new file or a rotated one.
The advantage is that if the file rotate before splunk had time to read the last events, then it will be able to continue on the rotated one.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Syslog-ng is a program to read syslog from network and write it to disk. This just grabs what's sent to it on (default) udp/tcp 514 and creates a file on the local system with those contents.

Logrotate is a system utility that, as per its setup, will do the rotation for you.

There is no configuration recommendation for syslog-ng or logrotate, because there's no one size fits all strategy. Your retention of raw logs is generally driven by how far back you may need to go if you hadn't noticed something broke and will be configured mostly with logrotate.

That being said, if you have no other requirements I'd set it to 7 days and have them rotated nightly. Here's what appears to be some nice samples or examples.

My syslog-ng is set up to write files at /var/log/remote/"hostname"/log.txt
My Splunk (A universal forwarder, in this case) is set to only read log.txt in that file monitor stanza, and use the 4th segment as the hostname.
Logrotate is set to rotate log.txt daily, creating log1.txt, then gzip the remaining older ones and delete them when 7 days old. So I have log.txt (monitored by Splunk), log1.txt, log2.txt.gz, log3.txt.gz and so on. So, if I happen to have a broken (or non-started) UF or input, I have 7 days of stuff on disk I can puzzle around to backfill the missing data in Splunk.

I can supply configs for most of this later if you need them, but finding samples on the internet that match what you need is usually pretty easy once you know what to look for, and hopefully I just gave you that information!

mmartin0926
New Member

Hey Rich, thanks for the reply!

Ok cool, not sure why I was saying syslog-ng for the rotation... Sorry its been a while.

But, thanks for the link to the examples and you explanation. It is very much appreciated. I think you gave me enough to go off of to gets this configured right, so thanks again!

-Matt

0 Karma

mmartin0926
New Member

One more question...

I read in the Splunk docs about how it handles log rotation, and its says it recognizes when a log was rotated, like /var/log/messages becoming .../messages1 and it will not read the rolled file a second time. What about if I use the logrotate config command that appends a date instead of just a number (*dateext). Does Splunk recognize that as well?

I'm assuming since it uses a CRC check to ID the files that it won't do that, but just wanted to be sure.

Thanks Again,
Matt

0 Karma

Richfez
SplunkTrust
SplunkTrust

Yes, more or less correct. There's a handful of settings that controls it (mainly regarding how much of the beginning of the file to use to determine if it's new or not new), but you usually don't have to change those. (For reference, that's in inputs.conf and probably the most used setting is initCrcLength = <integer> , but again, you shouldn't need to fiddle with that.)

In my case, I ONLY have splunk looking for log.txt in those folders (because that's what syslog-ng's writing), so any other file (like log1.txt or log.2016-05-02.txt) won't be read anyway.

For instance,

[monitor:///var/log/remote/10.128.0.*/log.txt]
host_segment = 4
sourcetype = syslog
index = network

That folder (well, ONE of the couple that match that wildcard) has log.txt, log.txt.1, log.txt.2.gz, log.txt.3.gz and so on back to 8. They're all ignored except log.txt.

There are tweaks I'm sure I could make, but it works fine. My main config for logrotate is just

/var/log/remote/*/*.txt
{
        rotate 8
        maxage 30
        daily
        missingok
        compress
        delaycompress
        postrotate
                invoke-rc.d syslog-ng reload > /dev/null
        endscript
}

So, does that explain how those two pieces fit together? (And no, I have no idea why I did a 8, not 7. Or even 3. Just picked one. 🙂 )

Then on the syslog-ng side, there's a simple config to just write everything into folder/files when it comes in from the network.

source s_network_udp { udp( port(514)); };
source s_network_tcp { syslog( port(514) transport("tcp")); };
destination d_syslogs { file ("/var/log/remote/${HOST}/log.txt"); };
log {source(s_network_udp); source(s_network_tcp); destination(d_syslogs); };

Take a gander at that, it's fairly readable.

Does that help?

mmartin0926
New Member

Yea, that's great, thanks for the explanations! That all makes sense...

And many thanks for the config examples, much appreciated!

-Matt

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...