Splunk Search

Strip header and trailer from web log files

beaumaris
Communicator

I have web log files that have both a header and a trailer line. The header looks like

Current-Time Time-to-Serve Client-IP <and more headers>

The trailer line looks like a comment

#Number of transaction records: <NNNNNN>

where is a number representing the number of records in the file. I tried setting up configuration files as follows:

props.conf
[my_sourcetype]
TRANSFORMS-tonull= strip_header,strip_footer

transforms.conf
[strip_header]
REGEX = Current-Time Time-to-Serve Client-IP
DEST_KEY = queue
FORMAT = nullQueue

[strip_footer]
REGEX = #Number of transaction records
DEST_KEY = queue
FORMAT = nullQueue

The indexer shows no sign of the header records, but the trailer records are still getting through and are causing errors in determining the unique event records. I am not sure if this is a syntax problem with trying to list multiple stanza names in the "TRANSFORMS-null" statement, or if it's a problem with the REGEX in the [strip_footer] statement itself ('#' could be a special character?). Would appreciate any pointers on how best to solve this problem. I've seen postings related to ignoring comments in IIS logs and am wondering if perhaps the best approach is somehow combining these into a single entry in transforms.conf. Thanks!

0 Karma

rturk
Builder

I stumbled across this while looking for a similar issue I'm having:

Can I suggest changing your REGEX expressions as follows:

transforms.conf
[strip_header]
REGEX = ^C

[strip_footer]
REGEX = ^#

This makes the assumption that all events that have the very first character as a C are headers (you should be pretty safe with this as it's the time-stamp field, and that all lines that start with a # are comments/footers that may also be disregarded.

The reason I'm only matching against a single character is to speed up the execution of the regular expression.

Hope this helps someone 🙂

0 Karma

jrodman
Splunk Employee
Splunk Employee

I sure hope we don't consider # midline in a regex string to be a comment, but at the moment don't know for sure. Perhaps you could try

REGEX = \x23Number of transaction records

\x23 is just a way to encode the hexidecimal number of that character, eg #.

0 Karma
Get Updates on the Splunk Community!

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...

Reminder! Splunk Love Promo: $25 Visa Gift Card for Your Honest SOAR Review With ...

We recently launched our first Splunk Love Special, and it's gone phenomenally well, so we're doing it again, ...