Getting Data In

Best practice for indexing files with headers (preamble_regex, field_header_regex, header_field_line_number)

threatanalyst
Engager

I have been trying to understand when it is best practice to use PREAMBLE_REGEX, FIELD_HEADER_REGEX, and/or HEADER_FIELD_LINE_NUMBER when indexing files with headers. I couldn't find in the documentation answers to some of the following questions:

  1. Will one attempted behavior ever "override" anther?
  2. If I use them all, which order do they take priority (listed order, some other order)?
  3. Is it best to only use the minimum number of settings required, or should I always try to set all of them?
  4. If a file without actual events still contains the header, how do I avoid Splunk registering the header as a separate event?

For example, I'm trying to parse the following sample output from TZWorks..

usp - full ver: 0.52; Copyright (c) TZWorks LLC
License #-------------- is authenticated for business use and registered to --------------
run time: -------------- [UTC]; Host: -------------
"cmdline: C:\--------------\usp64.exe -csvl2t -partition C:"
note: When comparing timestamps from manual analysis use option [-show_other_times] to see full range of timestamps recovered

date,time,timezone,MACB,source,sourcetype,type,user,host,short,desc,version,filename,inode,notes,format,extra
$sampledata...

I set up the following lines in props.conf (among other settings):

[usp]
PREAMBLE_REGEX = ^(usp|License|run|\"cmdline|\s*$)
FIELD_HEADER_REGEX = ^date
HEADER_FIELD_LINE_NUMBER = 7

These settings seem to work as long as the event files are consistent with the sample above. However, when no events are found, neither the header field ("date,time,timezone... etc.") nor the $sampledata exists, and Splunk interprets the first 5 lines as an actual event when indexing. Is there a better way to approach this in general that might also help solve my issue when the file does not contain events?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The docs say the FIELD_HEADER_REGEX value is not included in the headers so your current setting shouldn't work. That it does work tells me that field is trumped by one of the other two.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Maximizing the Value of Splunk ES 8.x

Splunk Enterprise Security (ES) continues to be a leader in the Gartner Magic Quadrant, reflecting its pivotal ...