Splunk Search

Problem with HEADER_FIELD_REGEX in TMG Logs

delink
Communicator

I am attempting to use the INDEXED_EXTRACTION = W3C configuration to pull logs from a Microsoft TMG server. I started with the isamonitor app that exists for ISA 2006 and built a new sourcetype on top of it for the TMG logs called tmgwebw3c (based on isawebw3c). The header of the W3C log looks as follows, with the fields line containing tab-separated values as does the data itself:

#Software: Microsoft Forefront Threat Management Gateway
#Version: 2.0
#Date: 2013-11-22 15:26:48
#Fields: c-ip   cs-username     c-agent date    time    s-computername  cs-referred     r-host  r-ip
    r-port  time-taken      sc-bytes        cs-bytes        cs-protocol     s-operation     cs-uri  cs-mime-type    s-object-source sc-status       rule    FilterInfo      cs-network      sc-network
      error-info      action  AuthenticationServer    NIS scan result NIS signature   ThreatName
      MalwareInspectionAction MalwareInspectionResult UrlCategory     MalwareInspectionContentDeliveryMethod  MalwareInspectionDuration       MalwareInspectionThreatLevel    internal-service-info   NIS application protocol        NAT address     UrlCategorizationReason SessionType     UrlDestHost     s-port  SoftBlockAction

Using the documentation at http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime I built a sourcetype that looks as follows:

[tmgwebw3c]
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = false
REPORT-tmgwebw3c = tmgwebw3c
TZ = GMT
INDEXED_EXTRACTIONS = W3C
FIELD_HEADER_REGEX = ^#Fields:
PREAMBLE_REGEX = ^#\w+: 
FIELD_DELIMITER = \t

Everything appears to be working well, but for the very first field, it is being named "Fields_c_ip" rather than the expected "c_ip". Based on the documentation, FIELD_HEADER_REGEX should not include the matched portion as part of the header line, but it seems to be doing so anyhow.

I tried to remove the PREAMBLE_REGEX also in case they were conflicting, but this did not solve the issue. Any assistance with this would be appreciated.

--
Brian T Glenn
Hurricane Labs

1 Solution

delink
Communicator

Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.

View solution in original post

0 Karma

delink
Communicator

Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.

0 Karma

ogdin
Splunk Employee
Splunk Employee

My props.conf settings that works with this:

[w3c_tab]
FIELD_DELIMITER=tab
FIELD_HEADER_REGEX=^#Fields:\s*(.*)
MISSING_VALUE_REGEX=-
TIME_FORMAT=%Y-%m-%d %H:%M:%S
TZ=GMT
TIMESTAMP_FIELDS=date,time

Note I had accidentally escaped the \s in FIELD_HEADER_REGEX

delink
Communicator

I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!

0 Karma

ogdin
Splunk Employee
Splunk Employee

Hi Brian,

Did you try just:

INDEXED_EXTRACTIONS = W3C  

Without any other settings? This actually sets the following under the covers:

FIELD_DELIMITER = whitespace
FIELD_HEADER_REGEX = ^#Fields:\\s*(.*)
MISSING_VALUE_REGEX = -
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
TIMESTAMP_FIELDS = date,time

We did have some trouble with tabs and spaces in Internet Security and Acceleration Server and I'm wondering if we'll see the same problems here.

delink
Communicator

I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...