Splunk Search

Problem with HEADER_FIELD_REGEX in TMG Logs

delink
Communicator

I am attempting to use the INDEXED_EXTRACTION = W3C configuration to pull logs from a Microsoft TMG server. I started with the isamonitor app that exists for ISA 2006 and built a new sourcetype on top of it for the TMG logs called tmgwebw3c (based on isawebw3c). The header of the W3C log looks as follows, with the fields line containing tab-separated values as does the data itself:

#Software: Microsoft Forefront Threat Management Gateway
#Version: 2.0
#Date: 2013-11-22 15:26:48
#Fields: c-ip   cs-username     c-agent date    time    s-computername  cs-referred     r-host  r-ip
    r-port  time-taken      sc-bytes        cs-bytes        cs-protocol     s-operation     cs-uri  cs-mime-type    s-object-source sc-status       rule    FilterInfo      cs-network      sc-network
      error-info      action  AuthenticationServer    NIS scan result NIS signature   ThreatName
      MalwareInspectionAction MalwareInspectionResult UrlCategory     MalwareInspectionContentDeliveryMethod  MalwareInspectionDuration       MalwareInspectionThreatLevel    internal-service-info   NIS application protocol        NAT address     UrlCategorizationReason SessionType     UrlDestHost     s-port  SoftBlockAction

Using the documentation at http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime I built a sourcetype that looks as follows:

[tmgwebw3c]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = false
REPORT-tmgwebw3c = tmgwebw3c
TZ = GMT
INDEXED_EXTRACTIONS = W3C
FIELD_HEADER_REGEX = ^#Fields:
PREAMBLE_REGEX = ^#\w+: 
FIELD_DELIMITER = \t

Everything appears to be working well, but for the very first field, it is being named "Fields_c_ip" rather than the expected "c_ip". Based on the documentation, FIELD_HEADER_REGEX should not include the matched portion as part of the header line, but it seems to be doing so anyhow.

I tried to remove the PREAMBLE_REGEX also in case they were conflicting, but this did not solve the issue. Any assistance with this would be appreciated.

--
Brian T Glenn
Hurricane Labs

1 Solution

delink
Communicator

Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.

View solution in original post

0 Karma

delink
Communicator

Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.

0 Karma

ogdin
Splunk Employee
Splunk Employee

My props.conf settings that works with this:

[w3c_tab]
FIELD_DELIMITER=tab
FIELD_HEADER_REGEX=^#Fields:\s*(.*)
MISSING_VALUE_REGEX=-
TIME_FORMAT=%Y-%m-%d %H:%M:%S
TZ=GMT
TIMESTAMP_FIELDS=date,time

Note I had accidentally escaped the \s in FIELD_HEADER_REGEX

delink
Communicator

I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!

0 Karma

ogdin
Splunk Employee
Splunk Employee

Hi Brian,

Did you try just:

INDEXED_EXTRACTIONS = W3C  

Without any other settings? This actually sets the following under the covers:

FIELD_DELIMITER = whitespace
FIELD_HEADER_REGEX = ^#Fields:\\s*(.*)
MISSING_VALUE_REGEX = -
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
TIMESTAMP_FIELDS = date,time

We did have some trouble with tabs and spaces in Internet Security and Acceleration Server and I'm wondering if we'll see the same problems here.

delink
Communicator

I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...