Getting Data In

sourcetype "too_small" and log rotation on rsyslog server

grundsch
Communicator

I'm collecting all syslog messages from my datacenter on a central rsyslog server.
rsyslog splits the messages following the following directory structure:
/log/yyyy/mm/dd/host/service.log

service is extracted from the syslog message, grouping messages from the same daemon in one log file.
I have one monitor input, looking for the whole tree. The host is extracted from the path. The sourcetype is left as "automatic". The idea being that Splunk could analyse every log file, and finds out if it is a postfix/apache/snmp/cron, .... logfile.

It works quite well, but all sourcetypes are xxx-too_small

(i.e. postfix-too_small, snmpd-too-small, ...)

I'm suspecting that as we are starting a new logfile for every host, service and day, at midnight there will be only one or two events in a new file. Splunk sees this new file, tries to find out what it is, get it quite right, but tags the sourcetype with "too_small", as there are less than 100 events.

My questions:

  • how can I suppress this "too_small"?
  • how you guys with central syslog servers are handling such setup? (I suppose I'm not alone indexing central syslog server) Especially, how are you handling the creation of new log files (i.e new sources from a point of view of Splunk) with few events in it?

Many thanks in advance for any tips & tricks!

grundsch
Communicator

Couple of months later, I learned some more.

  • the above file split for the central syslog proved to be a disaster for splunk. Somehow, it generated thousands of sourcetypes (because syslog generated thousands of different service names). -> This lead Splunk indexes to be completely fubar (any single search just consumed all CPU)

  • Fresh start: we are now keeping standard syslog messages in a separate tree (for archiving purposes), and dumping everything else in one syslog file per host. These files are then regularly rotated, and after two rotation discarded (data is in Splunk, and in separate archive)

This looks now much better. Sourcetype is fixed to be syslog. Not as fun as automatic sourcetype detection, but hey, these are really syslog messages...

I've also just read the following blog entry: http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/ which explain how I could now extract from this single stream of syslog different sourcetype per event. And probably reroute them to different indexes...

Question: how expensive is it to run regexp on every event during indexing?

Get Updates on the Splunk Community!

Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

 Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...