I have an rsyslog server aggregating syslog streams from switches and firewalls. The rsyslog server writes log files to disk by source IP address. I'm attempting to read these log files using an installed universal forwarder, and tag them with their host names. For example:
[monitor:///data/logs/10.1.1.1.log]
disabled = false
host = fw1
index = firewalls
[monitor:///data/logs/10.1.2.1.log]
disabled = false
host = fw2
index = firewalls
These logs are sent to the indexers, and then searched using the dedicated search head. Manually assigning the "host" name is working for some, but not all log files. Is there something I might be missing?
For some syslog sourcetypes, Splunk automatically replaces the host name with the actual host value found in each event. This happens during the parsing phase and will override host settings from inputs.conf or props.conf (which set the host value at the inputs phase of processing).
Specifying a different sourcetype - with a sourcetype that is not automatically transforming the data - will leave the host value as it was set in inputs.conf. Or, you can look for the original sourcetype in etc/system/default/transforms.conf - and then override the behavior with a corresponding stanza in etc/system/local/transforms.conf. Parsing usually occurs on your indexer(s), so that is where you would need to look for (and change) the transforms.conf files. (If you are using a heavy forwarder, parsing happens on the heavy forwarder instead of the indexer.)
For many syslog inputs, users actually want the default behavior, as it handles the common case where a syslog input is aggregated from multiple hosts. But clearly, that is not always the case! From the comments, I see that you have found a solution, but I wanted to explain what is actually happening and why your solution worked.
For some syslog sourcetypes, Splunk automatically replaces the host name with the actual host value found in each event. This happens during the parsing phase and will override host settings from inputs.conf or props.conf (which set the host value at the inputs phase of processing).
Specifying a different sourcetype - with a sourcetype that is not automatically transforming the data - will leave the host value as it was set in inputs.conf. Or, you can look for the original sourcetype in etc/system/default/transforms.conf - and then override the behavior with a corresponding stanza in etc/system/local/transforms.conf. Parsing usually occurs on your indexer(s), so that is where you would need to look for (and change) the transforms.conf files. (If you are using a heavy forwarder, parsing happens on the heavy forwarder instead of the indexer.)
For many syslog inputs, users actually want the default behavior, as it handles the common case where a syslog input is aggregated from multiple hosts. But clearly, that is not always the case! From the comments, I see that you have found a solution, but I wanted to explain what is actually happening and why your solution worked.
Awesome, Iguinn. Thank you for your detailed response!
How are you assigning sourcetype to these events? I would check configurations (props and transforms.conf ) for these data input which is setup on Indexers to check if there are some host override happening there.
I wasn't. However, I am now, and that seemed to fix my problem. I'm using the following sourcetypes for my firewall feeds:
cisco_syslog
sophos_utm_syslog
vyatta_syslog
Now everything is coming through (very odd). Thank you for your input, somesoni2!