After a restart Splunk changed the way it is indexing a syslog file.
I have three log servers writing to an NFS share, each of the files is listed as a separate input, all three are configured exactly the same way, and all three were indexing exactly the same way until a restart of SPLUNK yesterday. The syslog servers have several servers reporting to them.
Each input is set to:
Set Host (constant value) Host field value (syslog1) Set Sourcetype (Automatic) Index (581) Whitelist (blank) Blacklist (blank)
The only difference is that the host field value is either syslog1, syslog2, or syslog3. Until yesterday, the actual hostname being written to the file was being properly parsed out and index. syslog1 & syslog2 are still parsing the actual hostname out of the log files. All of a sudden out of the blue, syslog3 is reporting all lines in the file as hostname, syslog3.
With the configuration listed above, I would have thought that all entries from all three sources would be listing the host field as either syslog1, 2, or 3, but the Automatic indexing seemed to be extracting the proper hostname, despite the input configuration. My concern is not that the sysetem should have been indexing the information this way all along, but why did it change all of a sudden with yesterdays restart?
Found the problem. Somehow when I restarted SPLUNK yesterday, the syslog3 input switched from syslog to syslog-3, and then began assigning the name syslog3 as the host name. It's almost like after the restart the input thought there was a custom data format. I've since removed automatic from the surcetype and forced it to syslog. Works as expected now.
Yeah, if you know the sourcetype in advance, always force it manually. Or devise some sourcetype assignment rules in props.conf based on the source field. This way you'll never get any unwanted surprise.