I have a large set of files that I need to index and I need to set the host field based on data contained in the files, the problem is that the host value only appears on the very first line of the file.
2013-10-10T13:03:29.513 (arbitrary event data ...)
2013-10-10T13:03:45.515 (arbitrary event data ...)
2013-10-10T13:03:51.213 (arbitrary event data ...)
2013-10-10T13:03:52.742 (arbitrary event data ...)
The first line does not describe the event fields at all, it's simply information about what server the logs came from and other misc information.
I've searched around but I can't seem to find any obvious way to apply the hostname on the first line to every event in the file. Is there any way to do this out of the box, or do I perhaps need to build a scripted input to make this work?
I was not able to test it, but what you found makes sense. You're asking Splunk to refer to the top line of the file as it watches for changes to the bottom, and also look for other files. I have been surprised by Splunk in the past, but I don't think it can do this one.
If the files for each host were in different folders you could do it by sourcetype.
Another way would be to create a sed script that runs on the files outside of Splunk to take the host name out of the first line and add it to all of the other lines.
The answer you've provided shows how to override the host if it's included in every event, but this is not the case in my situation. I'm aware of how to extract the host field from the first line, my issue is that I'm not sure of how I can apply that host value to all the other events contained in the file.
And I did test this solution just to be sure. The host is overridden for the first event/line in my log, but all other events have the default host value.