Splunk Search

Issue with Splunk extracting escaped fields/values

flle
Path Finder

Hi,

I use the CEFUtils app to do search time field extractions of CEF formated events.
The problem is that Splunk also identifies and extracts key/values pairs where the = between key and value is escaped. I just can't figure out why this is the case. With the = escaped, Splunk should not identify the string a a key value pair.

Example Event :

CEF:0|M86 Security|SWG|10|M86 SWG Web Event|M86 SWG Web Event|1| act= m86swgblockreason= src=192.1.2.3 dst=10.235.156.133 suser=user1 requestMethod=GET app=HTTP m86swgresponsestatus=302 dvc=192.10.10.10 m86swgtransactionID=506C36148 m86swgtransactiontime=10/03/2012 14:56:52 m86swgtransactionsize=906 fileType= request=http://www.example.net/b/ss/2874979579761?AQB\=1&ndh\=1&t\=3/9/2012 14:56:52 3 -120&cc\=25 m86swgurlcategory=Web-based

transforms.conf for the CEF extraction:

[cefHeaders]
REGEX = CEF:(?<cef_cefVersion>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_version>[^|]*)\|(?<cef_signature>[^|]*)\|(?<cef_name>[^|]*)\|(?<cef_severity>[^|]*)

[cefKeys]
REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
REPEAT_MATCH = True
CLEAN_KEYS = 1

The general CEF kv extraction works as expected. All headers and KVs are correctly extracted (i.e. the value for the request field contains the complete url) but additionaly Splunk also extracts the fields AQB, ndh, t and cc from within the URL although the = is escaped.

Edit / additional info:

The cefutils app is running on the Search Head and the events come in via syslog to the indexer. For this input (in inputs.conf) I set the Sourcetype CEF and in props.conf I strip the syslog header with

TRANSFORMS-m86sourcetype= syslog-header-stripper-ts-host

So I get the plain CEF event indexed and properly formated for the CEF field extraction.

Btool output for cefKeys (key/value Extraction from the CEF sourcetype) on the SH:

./splunk cmd btool --debug transforms list cefKeys
cefutils   [cefKeys]
system     CAN_OPTIMIZE = True
cefutils   CLEAN_KEYS = 1
system     DEFAULT_VALUE =
system     DEST_KEY =
system     FORMAT =
system     KEEP_EMPTY_VALS = False
system     LOOKAHEAD = 4096
system     MV_ADD = False
cefutils   REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
cefutils   REPEAT_MATCH = True
system     SOURCE_KEY = _raw
system     WRITE_META = False

Sourcetype CEF is referenced in [SPLUNK_HOME]/etc/system/local/props.conf. Btool:

system     [CEF]
system     ANNOTATE_PUNCT = True
system     BREAK_ONLY_BEFORE =
system     BREAK_ONLY_BEFORE_DATE = True
system     CHARSET = UTF-8
system     DATETIME_CONFIG = /etc/datetime.xml
system     HEADER_MODE =
system     LEARN_SOURCETYPE = true
system     LINE_BREAKER_LOOKBEHIND = 100
system     MAX_DAYS_AGO = 2000
system     MAX_DAYS_HENCE = 2
system     MAX_DIFF_SECS_AGO = 3600
system     MAX_DIFF_SECS_HENCE = 604800
system     MAX_EVENTS = 256
system     MAX_TIMESTAMP_LOOKAHEAD = 128
system     MUST_BREAK_AFTER =
system     MUST_NOT_BREAK_AFTER =
system     MUST_NOT_BREAK_BEFORE =
system     REPORT-cefevents = cefHeaders,cefKeys           <- transforms reference
system     SEGMENTATION = indexing
system     SEGMENTATION-all = full
system     SEGMENTATION-inner = inner
system     SEGMENTATION-outer = outer
system     SEGMENTATION-raw = none
system     SEGMENTATION-standard = standard
system     SHOULD_LINEMERGE = false
system     TRANSFORMS =
system     TRUNCATE = 10000
system     maxDist = 100

If I test the cefKeys regex with a regex tester and the above mentioned example event, everything looks good. The excaped = within the request field are ignored. But with the Splunk Search Head they they do get extracted.

Thanks & Regards

Flo

0 Karma
1 Solution

IgorB
Path Finder

You may want to add "KV_MODE = none" to props.conf.
Example:

[cefevents]
KV_MODE = none
TIME_PREFIX = \s(end|rt)\=
TIME_FORMAT = %10S%3n
MAX_TIMESTAMP_LOOKHEAD = 350
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
REPORT-cefevents = cefHeaders,cefKeys

View solution in original post

IgorB
Path Finder

You may want to add "KV_MODE = none" to props.conf.
Example:

[cefevents]
KV_MODE = none
TIME_PREFIX = \s(end|rt)\=
TIME_FORMAT = %10S%3n
MAX_TIMESTAMP_LOOKHEAD = 350
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
REPORT-cefevents = cefHeaders,cefKeys

View solution in original post

lguinn2
Legend

I would use btool to see exactly what the settings are for the sourcetype:

splunk cmd btool transforms list --debug

splunk cmd btool props list --debug

Here is the documentation: Use btool to troubleshoot configurations

Once you see what the settings are, you might want to look up the meaning of the various items. I wonder if you have a DELIMS setting in a default transform somewhere - but you will be able to see this in your btool output. If you can't figure it out, you can copy the btool output (just for that sourcetype!) into your question, and we will be able to give more advice...

There are other things that might be involved, as btool is doing a static combination of configurations. But btool is the best place to start when you don't see "why" a configuration is behaving in a particular way

0 Karma

flle
Path Finder

I did look at btool already but could not find a reason for the behaviour. But maybe I just did not see it. I updated my post.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!