Getting Data In

sourcetype "too_small" and log rotation on rsyslog server


I'm collecting all syslog messages from my datacenter on a central rsyslog server.
rsyslog splits the messages following the following directory structure:

service is extracted from the syslog message, grouping messages from the same daemon in one log file.
I have one monitor input, looking for the whole tree. The host is extracted from the path. The sourcetype is left as "automatic". The idea being that Splunk could analyse every log file, and finds out if it is a postfix/apache/snmp/cron, .... logfile.

It works quite well, but all sourcetypes are xxx-too_small

(i.e. postfix-too_small, snmpd-too-small, ...)

I'm suspecting that as we are starting a new logfile for every host, service and day, at midnight there will be only one or two events in a new file. Splunk sees this new file, tries to find out what it is, get it quite right, but tags the sourcetype with "too_small", as there are less than 100 events.

My questions:

  • how can I suppress this "too_small"?
  • how you guys with central syslog servers are handling such setup? (I suppose I'm not alone indexing central syslog server) Especially, how are you handling the creation of new log files (i.e new sources from a point of view of Splunk) with few events in it?

Many thanks in advance for any tips & tricks!


Couple of months later, I learned some more.

  • the above file split for the central syslog proved to be a disaster for splunk. Somehow, it generated thousands of sourcetypes (because syslog generated thousands of different service names). -> This lead Splunk indexes to be completely fubar (any single search just consumed all CPU)

  • Fresh start: we are now keeping standard syslog messages in a separate tree (for archiving purposes), and dumping everything else in one syslog file per host. These files are then regularly rotated, and after two rotation discarded (data is in Splunk, and in separate archive)

This looks now much better. Sourcetype is fixed to be syslog. Not as fun as automatic sourcetype detection, but hey, these are really syslog messages...

I've also just read the following blog entry: which explain how I could now extract from this single stream of syslog different sourcetype per event. And probably reroute them to different indexes...

Question: how expensive is it to run regexp on every event during indexing?

Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...