Hi,
I use the CEFUtils app to do search time field extractions of CEF formated events.
The problem is that Splunk also identifies and extracts key/values pairs where the = between key and value is escaped. I just can't figure out why this is the case. With the = escaped, Splunk should not identify the string a a key value pair.
Example Event :
CEF:0|M86 Security|SWG|10|M86 SWG Web Event|M86 SWG Web Event|1| act= m86swgblockreason= src=192.1.2.3 dst=10.235.156.133 suser=user1 requestMethod=GET app=HTTP m86swgresponsestatus=302 dvc=192.10.10.10 m86swgtransactionID=506C36148 m86swgtransactiontime=10/03/2012 14:56:52 m86swgtransactionsize=906 fileType= request=http://www.example.net/b/ss/2874979579761?AQB\=1&ndh\=1&t\=3/9/2012 14:56:52 3 -120&cc\=25 m86swgurlcategory=Web-based
transforms.conf for the CEF extraction:
[cefHeaders]
REGEX = CEF:(?<cef_cefVersion>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_version>[^|]*)\|(?<cef_signature>[^|]*)\|(?<cef_name>[^|]*)\|(?<cef_severity>[^|]*)
[cefKeys]
REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
REPEAT_MATCH = True
CLEAN_KEYS = 1
The general CEF kv extraction works as expected. All headers and KVs are correctly extracted (i.e. the value for the request field contains the complete url) but additionaly Splunk also extracts the fields AQB, ndh, t and cc from within the URL although the = is escaped.
Edit / additional info:
The cefutils app is running on the Search Head and the events come in via syslog to the indexer. For this input (in inputs.conf) I set the Sourcetype CEF and in props.conf I strip the syslog header with
TRANSFORMS-m86sourcetype= syslog-header-stripper-ts-host
So I get the plain CEF event indexed and properly formated for the CEF field extraction.
Btool output for cefKeys (key/value Extraction from the CEF sourcetype) on the SH:
./splunk cmd btool --debug transforms list cefKeys
cefutils [cefKeys]
system CAN_OPTIMIZE = True
cefutils CLEAN_KEYS = 1
system DEFAULT_VALUE =
system DEST_KEY =
system FORMAT =
system KEEP_EMPTY_VALS = False
system LOOKAHEAD = 4096
system MV_ADD = False
cefutils REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
cefutils REPEAT_MATCH = True
system SOURCE_KEY = _raw
system WRITE_META = False
Sourcetype CEF is referenced in [SPLUNK_HOME]/etc/system/local/props.conf. Btool:
system [CEF]
system ANNOTATE_PUNCT = True
system BREAK_ONLY_BEFORE =
system BREAK_ONLY_BEFORE_DATE = True
system CHARSET = UTF-8
system DATETIME_CONFIG = /etc/datetime.xml
system HEADER_MODE =
system LEARN_SOURCETYPE = true
system LINE_BREAKER_LOOKBEHIND = 100
system MAX_DAYS_AGO = 2000
system MAX_DAYS_HENCE = 2
system MAX_DIFF_SECS_AGO = 3600
system MAX_DIFF_SECS_HENCE = 604800
system MAX_EVENTS = 256
system MAX_TIMESTAMP_LOOKAHEAD = 128
system MUST_BREAK_AFTER =
system MUST_NOT_BREAK_AFTER =
system MUST_NOT_BREAK_BEFORE =
system REPORT-cefevents = cefHeaders,cefKeys <- transforms reference
system SEGMENTATION = indexing
system SEGMENTATION-all = full
system SEGMENTATION-inner = inner
system SEGMENTATION-outer = outer
system SEGMENTATION-raw = none
system SEGMENTATION-standard = standard
system SHOULD_LINEMERGE = false
system TRANSFORMS =
system TRUNCATE = 10000
system maxDist = 100
If I test the cefKeys regex with a regex tester and the above mentioned example event, everything looks good. The excaped = within the request field are ignored. But with the Splunk Search Head they they do get extracted.
Thanks & Regards
Flo
You may want to add "KV_MODE = none" to props.conf.
Example:
[cefevents]
KV_MODE = none
TIME_PREFIX = \s(end|rt)\=
TIME_FORMAT = %10S%3n
MAX_TIMESTAMP_LOOKHEAD = 350
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
REPORT-cefevents = cefHeaders,cefKeys
You may want to add "KV_MODE = none" to props.conf.
Example:
[cefevents]
KV_MODE = none
TIME_PREFIX = \s(end|rt)\=
TIME_FORMAT = %10S%3n
MAX_TIMESTAMP_LOOKHEAD = 350
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
REPORT-cefevents = cefHeaders,cefKeys
I would use btool to see exactly what the settings are for the sourcetype:
splunk cmd btool transforms list --debug
splunk cmd btool props list --debug
Here is the documentation: Use btool to troubleshoot configurations
Once you see what the settings are, you might want to look up the meaning of the various items. I wonder if you have a DELIMS setting in a default transform somewhere - but you will be able to see this in your btool output. If you can't figure it out, you can copy the btool output (just for that sourcetype!) into your question, and we will be able to give more advice...
There are other things that might be involved, as btool is doing a static combination of configurations. But btool is the best place to start when you don't see "why" a configuration is behaving in a particular way
I did look at btool already but could not find a reason for the behaviour. But maybe I just did not see it. I updated my post.