Getting Data In

How to drop log file headers before indexing?

gjanders
SplunkTrust
SplunkTrust

I've had a read of dropping useless headers in Splunk 6 and tried using the FIELD_HEADER_REGEX, in fact I also tried the HEADER_FIELD_LINE_NUMBER trick but that did not work as expected either.

The blog post says:

I stole some of this from the
Websphere App but added the
FIELD_HEADER_REGEX. This tells Splunk
to look for that last line of the
header from above:
************* End Display Current Environment *************
And start indexing events after that.
You could also use
HEADER_FIELD_LINE_NUMBER if your data
writes a consistent number of header
lines.

The default props.conf is:

[MSAD:NT6:DNS]
CHECK_FOR_HEADER = 0
REPORT_KV_for_microsoft_dns_web = KV_for_port,KV_for_Domain,KV_for_RecvdIP,KV_for_microsoftdns_action,KV_for_Record_type,KV_for_Record_Class
SHOULD_LINEMERGE = false

For the props.conf on the universal forwarder I added:

[MSAD:NT6:DNS]
#Drop the header lines from the file
FIELD_HEADER_REGEX=\s+16\s+Question\s+Name

That does something, it combines the header so it looks like this:

Log file wrap at 27/06/2017 2:35:27 PM
Message logging key (for packets - other items use a subset of these fields):
    Field #  Information         Values
    -------  -----------         ------
       1     Date
       2     Time
       3     Thread ID
       4     Context
       5     Internal packet identifier
       6     UDP/TCP indicator
       7     Send/Receive indicator
       8     Remote IP
       9     Xid (hex)
      10     Query/Response      R = Response
                                 blank = Query
      11     Opcode              Q = Standard Query
                                 N = Notify
                                 U = Update
                                 ? = Unknown
      12     [ Flags (hex)
      13     Flags (char codes)  A = Authoritative Answer
                                 T = Truncated Response
                                 D = Recursion Desired
                                 R = Recursion Available
      14     ResponseCode ]
      15     Question Type
      16     Question Name

Which is an improvement over having 16 random events that relate to the header, but it has not dropped.

On the indexing tier I tried:

[MSAD:NT6:DNS]
TRANSFORMS-t1 = eliminate-DNSHeaders

And:

[eliminate-DNSHeaders]
REGEX=(?m)^Log file wrap at
DEST_KEY = queue
FORMAT = nullQueue

I also tried without the ?m , clearly I'm missing something but I am not sure how I should drop the header records, they are not useful...

If someone can let me know how to drop these records, the last setting was placed on the indexers not the universal forwarders.

0 Karma
1 Solution

gjanders
SplunkTrust
SplunkTrust

The answer was surprisingly simple, my regex was an exact match so that stopped the trick from working, what was required was:
FIELD_HEADER_REGEX=\s+16\s+Question\s+N

This way it matches the line but not the entire line, and then the entire section up to that point appears to have been dropped.

EDIT: there are a few limitations I've found to this, the forwarder in use started mentioning time parsing and line breaking warnings after this setting was applied so I assume it's attempting to parse the logs before forwarding them to the indexer.
Furthermore the forwarder's CPU increased in order to process this setting, therefore I've fallen back to performing the work on the indexer rather than the universal forwarder by not using this trick.

View solution in original post

0 Karma

gjanders
SplunkTrust
SplunkTrust

The answer was surprisingly simple, my regex was an exact match so that stopped the trick from working, what was required was:
FIELD_HEADER_REGEX=\s+16\s+Question\s+N

This way it matches the line but not the entire line, and then the entire section up to that point appears to have been dropped.

EDIT: there are a few limitations I've found to this, the forwarder in use started mentioning time parsing and line breaking warnings after this setting was applied so I assume it's attempting to parse the logs before forwarding them to the indexer.
Furthermore the forwarder's CPU increased in order to process this setting, therefore I've fallen back to performing the work on the indexer rather than the universal forwarder by not using this trick.

0 Karma

gjanders
SplunkTrust
SplunkTrust

I couldn't get it drop the headers as expected so I've resorted to this for now:

props.conf
TRANSFORMS-t1 = eliminate-dnsheaders

transforms.conf

[eliminate-dnsheaders]
REGEX = ^[^\d]
DEST_KEY = queue
FORMAT = nullQueue

However I'd like to understand about why I cannot drop headers so I've asked splunk support for some advice.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

you want to filter out until
27. 16 Question Name
right? (this contradicts with - REGEX=(?m)^Log file wrap at right)

maybe, did you try -
FIELD_HEADER_REGEX=16\s+Question\s+Name
or simply
FIELD_HEADER_REGEX=Question\s+Name

0 Karma

gjanders
SplunkTrust
SplunkTrust

you want to filter out until
27. 16 Question Name right? (this contradicts with - REGEX=(?m)^Log file
wrap at right)

Yes I want to filter out the headers, the last header is "16 Question Name".

maybe, did you try -
FIELD_HEADER_REGEX=16\s+Question\s+Name
or simply
FIELD_HEADER_REGEX=Question\s+Name

Before I had the FIELD_HEADER_REGEX working each line such as:
Log file wrap at 27/06/2017 2:35:27 PM
Message logging key (for packets - other items use a subset of these fields):

Came as an individual event, now I have a 16 line individual event so I'm fairly confident the FIELD_HEADER_REGEX is doing something, note the FIELD_HEADER_REGEX is on the UF.

Also FYI I tested the transforms.conf as:
REGEX = Log file wrap at

Still not working, I will now try adding the FIELD_HEADER_REGEX at the indexer level just in case it was supposed to be there (the documentation implies the place of input is where it goes but it's worth a try).

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...