Hi,
I think the key point is that you're talking about two separate data input types, UDP and file monitor, so it's conceivable that they will be treated differently.
Each UDP message comes from one specific source (i.e. could be many devices but each device is unique) so the UDP monitor can easily set the host on the event in Splunk - it's where the UDP message came from, easy. The other reason it may work is that Splunk may well be seeing your UDP source and correctly assuming the sourcetype of "syslog" which it knows how to extract the host for (done at an event parsing level for you - see more info on this below). Or you may have specified sourcetype as syslog on the UDP input.
Once you change the input to write to a shared file Splunk does not necessarily know which host the message is from, i.e. many hosts all logging to one file, lots of different hosts.
Splunk does recognise certain formats as a pretrained sourcetypes, syslog being one of them, i.e. it will stamp a sourcetype of "syslog" on syslog messages if they fit the expected format for syslog messages.
Depending on the sourcetype, when sourcetype has been stamped on the event, the pre-canned props and transforms Splunk ships with will parse/interrogate the event and try to stamp the correct host on the event, using pattern matching defined for that sourcetype. Don't expect this to just work for all events for all types of data though, often you will need to define this logic yourself.
This parsing logic is provided by the props.conf and transforms.comf in etc/system/default for sourcetype "syslog". As you can see, the "syslog-host" transform is performing the host definition : -
Props:
[syslog]
pulldown_type = true
maxDist = 3
TIME_FORMAT = %b %d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 32
TRANSFORMS = syslog-host
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
Transforms: -
[syslog-host]
DEST_KEY = MetaData:Host
REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s
FORMAT = host::$1
Probably what is happening here is that because they are being seen as apache messages you are not getting a sourcetype of "syslog" stamped as they are recognised as something else. As such, the host isn't extracted because whatever sourcetype is being set doesn't have pre-canned props and transforms defined. You will either need to define these or find an app that has them for you. Alternatively you could just set "sourcetype = syslog" on your inputs.conf entry for the syslog file(s) you are monitoring. However you might want your breakdowns less general than this, i.e. have a range of syslog_<vendor> sourcetypes so that you can search and perform field extractions on those sourcetypes later on in the Splunk UI.
A good way to set host is to try and do it at a source wide level (more efficient than event-by-event parsing, i.e. if you have the same host value for every event in a file you are monitoring then there's no need to make Splunk parse each event. With syslog you can achieve this by configuring your syslog server to write to a different folder for each host where the folder name reflects the name of the host. E.g. so you end up with folders like below.
/mnt/logs/192.168.1.1/syslog.log
/mnt/logs/192.168.1.2/syslog.log
/mnt/logs/192.168.1.3/syslog.log
In the inputs.conf where you specify the data collection you can tell it to look at a particular segment of the file name for the host.
E.g. the inputs.conf definition for the syslog file monitor would look something like this where the host is wildcarded by the "*":
[monitor:///mnt/logs/*/syslog.log]
host_segment = 3
Hope this helps!
... View more