You are very correct about your situation. There are TWO very unknown/unused splunk configurations that I have used in such situations. You are implying that the host value can be found somewhere inside of the file, hopefully on the first line. You are going to combine this: https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Assignmetadatatoeventsdynamically with the "unarchive_cmd" here: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf So here is some unarchive code that we used to create a Semaphore event that summarizes the data (so that we can test the data found by search against what the semaphore event says should be there and know FOR SURE whether our search has all the data from the file, or some of it is missing for some reason): [source::....import] unarchive_cmd = gawk 'BEGIN { min="999999999999"; max="0"; count="0" } /./ { match ($0, /"time":([0-9\.]+)/, time); min = (min > time[1] && time[1] > 0 ? time[1] : min ); max = (max < time[1] && time[1] > 0 ? time[1] : max); count++; print } END { "date +%s.000000" | getline date; close("date"); print "{\"time\":"date",\"earliest\":"min",\"latest\":"max",\"NumberOfRecords\":"count",\"SplunkIndexingStatusSemaphore\":\"Splunk Indexing Complete\"}" }' sourcetype = preprocess-yourSourcetypeHere So what this does is that when splunk sees a file named "*.import", it passes the file to this "gawk" script which calculates "min(_time) max(_time) count" as it echoes out each line of data for the UF to process. Then, at the very end, it emits a final JSON summary event. So we get each original line/event as-is/as-was, AND 1 extra, super-useful event. Your use case is a bit different. You will need to is buffer the events/rows/data until you get the to point where you can discern the host. Then you emit a line like this to "stdout": ***SPLUNK*** host=YourHostValueHere Then you will reprocess your queue, then continue processing the file's rows/events echoing out lines as-is.
... View more