One of the hosts sending syslog data is a Barracuda Web Filter. I would like to be able to map field names to the values in the space-delimited syslog entries that it generates. But, it looks like this is done in transforms.conf by sourcetype, and I don't want to apply my Barracuda-specific field mappings to every host that sends syslog data on UDP port 514.
Am I expected to define a special sourcetype for the Barracuda? If so, how do I assign the sourcetype via hostname (or some other identifying characteristic) instead of just by port number?
I tried creating etc/system/local/props.conf with the following contents, then restarting splunkd. It seems to have had no effect:
[host::barracuda-hostname.domain] sourcetype = barracuda
Yes, there is a way to do this. The important caveat here is that if you are using the "syslog" sourcetype, "host" is getting extracted from the message and forced - but this is at the same time you are also trying to force the sourcetype. Splunk doesn't know of this change yet, so you need to use the original host, sourcetype or source:
--props.conf-- [syslog] <-- important part. host=barracuda hasn't been set yet, so use syslog or the hostname of the forwarder TRANSFORMS-force_st_for_barracuda = force_barracuda_st --transforms.conf-- [force_barracuda_st] DEST_KEY = MetaData:Sourcetype REGEX = (barracuda-hostname.domain|bar.rac.uda.ip) <-- some unique string that only appears in Barracuda events FORMAT = sourcetype::barracuda
As Felix mentioned, routing to different log files is a nice approach. There are many options here, it's all about finding the one that makes the most sense in your situation.
syslog-ng running on our central splunk indexer. We listen on a couple of different IP address (we use one IP for normal syslog stuff, and the other is used for syslog events coming from cisco network devices or from our firewall). Sending the data on two different IP addresses allows us to use the standard syslog port and if volume someday goes up we can split out the work onto separate boxes. From there we use a bunch of syslog-ng rules to place the content into different logs. Some of this is done by simple syslog filtering logic, and some of it uses host filtering and regex matching. But in the end, syslog-ng writes out basically 1 file per sourcetype. (I say "basically", because in some cases I found it helpful to split the log files based on severity level, which then becomes part of the log name -- and then I setup a field extraction in splunk; which is nice when you want to only look at the more serious events.)
BTW, have you tried setting up field extraction directly against your host?
[host::barracuda-hostname.domain] EXTRACT-fields = ^S+\S+(?<field1>\S+) ...
If this is the only kind of events that are coming from that host, then doing a search-time field extraction should be an efficient option.
[host::hostname] will only work if it references the hostname that is seen when the event arrives in to Splunk. If the sourcetype of data is
syslog, there is a built-in transform that extracts and sets the
host field from the raw event data and is what you'll see in Splunk when searching. So, it is important to know what the
host value is prior to it being transformed. You perhaps do this by disabling the transform, or using some sourcetype temporarily that does not have that transform.
I recommend to write the syslog messages to disk with syslogd or Kiwi syslog daemon, then indexing the log files, instead of sending it straight to Splunk.
This way you can easily assign different extractions to the different syslog streams based on source rather than sourcetype. There are some answers that deal with setups like this on Windows: http://answers.splunk.com/questions/5111/best-way-to-write-syslog-to-a-file-on-windows
And a wiki entry about setting this up on Linux: http://www.splunk.com/wiki/Deploy:Best_Practice_For_Configuring_Syslog_Input