Getting Data In

Best practice for indexing files with headers (preamble_regex, field_header_regex, header_field_line_number)

I have been trying to understand when it is best practice to use PREAMBLEREGEX, FIELDHEADERREGEX, and/or HEADERFIELDLINENUMBER when indexing files with headers. I couldn't find in the documentation answers to some of the following questions:

  1. Will one attempted behavior ever "override" anther?
  2. If I use them all, which order do they take priority (listed order, some other order)?
  3. Is it best to only use the minimum number of settings required, or should I always try to set all of them?
  4. If a file without actual events still contains the header, how do I avoid Splunk registering the header as a separate event?

For example, I'm trying to parse the following sample output from TZWorks..

usp - full ver: 0.52; Copyright (c) TZWorks LLC
License #-------------- is authenticated for business use and registered to --------------
run time: -------------- [UTC]; Host: -------------
"cmdline: C:\--------------\usp64.exe -csvl2t -partition C:"
note: When comparing timestamps from manual analysis use option [-show_other_times] to see full range of timestamps recovered


I set up the following lines in props.conf (among other settings):

PREAMBLE_REGEX = ^(usp|License|run|\"cmdline|\s*$)

These settings seem to work as long as the event files are consistent with the sample above. However, when no events are found, neither the header field ("date,time,timezone... etc.") nor the $sampledata exists, and Splunk interprets the first 5 lines as an actual event when indexing. Is there a better way to approach this in general that might also help solve my issue when the file does not contain events?

0 Karma


The docs say the FIELDHEADERREGEX value is not included in the headers so your current setting shouldn't work. That it does work tells me that field is trumped by one of the other two.

If this reply helps you, an upvote would be appreciated.
0 Karma