I have Splunk crawling a /logs directory, which is where it receives most of its data. (/logs is populated via syslog-ng). In inputs.conf, I set host_ segment = 2 so that the hostname will be set to the second segment in the path. This has been working fine most of the time. Here is the inputs.conf stanza:
[monitor:///logs] disabled = false sourcetype = syslog host_segment = 2 blacklist = \.(bz2|gz)$
But suddenly I'm noticing some strange hostnames on my indexer... "list_ primary_ nodes", "add_ aam_ node", "find_ active_ primary", "shut_ down_ vmap_ proce" etc... I noticed that they're all coming from a series of new servers that have been sending logs to Splunk: ESXi hosts.
The log path is correct:
Here is an event:
09/14/11 19:31:56 [shut_ down_ vmap_ proce] attempt to stop VMap_ HOSTNAME failed.
So why is it not setting the hostname to HOSTNAME? Why is it setting it to this other hostname that it's capturing from this, albeit unusual-looking, syslog?
It may be possible that a props.conf and/or transforms.conf are resetting the host to something extracted from the event. Check the $SPLUNK_HOME/systems/default/props and transforms files. The regex's on my install look like its pulling syslog host from the insides of the  in your event, which can be a valid way for syslog to output the hostname. Check the RFC for standard syslog output (which apparently esxi doesn't comply to).
I don't think your host_segment actually does anything, especially since the sourcetype is syslog, which gets a forced host any way.
You may want to do something like having syslog dump the esxi hosts to "/logs/esxi" and doing a props on the source (ie [source...esxi.]) and set the host name with a regex that way.