All Apps and Add-ons

unable to extract multi word values from CEF fields

att35
Builder

Hi,

We have Imperva logs coming into splunk as CEF via syslog. We ran into the issue of Splunk only extracting first word, from the multi-word value's usually given in CEF format.

I came across the App "CEF (Common Event Format) Extraction Utilities". http://apps.splunk.com/app/487/

Installed the app, but the extractions are still the same. I believe there must be a way to make sure app features are integrated in the regular splunk search but not sure what I am missing?

On the app main page, it mentions the following:

cefKeys - fixes multiword value extraction (by default Splunk would only extract key's values up to the first whitespace character)

This is exactly what we want, but not sure how to make it work.

In another Splunk answer(http://answers.splunk.com/answers/140326/cef-parsing-using-custom-field-labels-and-the-cefutils-app), it was suggested to set KV_mode=cef. Did that on props.conf within the app, but that didn't change anything.

Am I supposed to copy the props.conf from app, into props.conf under /opt/splunk/etc/system/default?

Looks like I am missing something very basic here.. any help would be appreciated..

Thanks,

~Abhi

0 Karma
1 Solution

dshpritz
SplunkTrust
SplunkTrust

The "kv_mode = cef" was a suggestion for a feature in Splunk, but does not actually exist.

That same answer also points to a TA that I made for Imperva when sending CEF events:

http://apps.splunk.com/app/955/

Otherwise, you will need some regex magic. Here is a recipe I have used in the past.

First, let’s break out the message. CEF specifies a header, followed by key-value pairs, so lets do an extract to grab that. Afterwards, we use a REPORT to get the fields parsed:
props.conf:

[my_cef_sourcetype]
KV_MODE = none
EXTRACT-cef-message = \sCEF:\d\|(?<vendor>[^\|]+)\|(?<product>[^\|]+)\|(?<version>[^\|]+)\|(?<signature_id>[^\|]+)\|(?<signature>[^\|]+)\|(?<vendor_severity>[^\|]+)\|(?<cef_message>.*)
REPORT-parse_cef = cef_auto_kv,cef_first,cef_last

Our transforms.conf will then hold some of the regex magic:
transforms.conf:

[cef_auto_kv]
SOURCE_KEY = cef_message
REGEX = \s(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_first]
SOURCE_KEY = cef_message
REGEX = ^(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_last]
SOURCE_KEY = cef_message
REGEX = (\w+)=([^=]+)$
FORMAT = $1::$2

Note that this isn't a full-fledged solution, its really more of a cookbook recipe. There will generally be some cases where extra regexes will need to be used to complete some fields. Also note that this will not match the label fields with their values, as I feel that would require more processing than regex is really capable of.

HTH,

Dave

View solution in original post

sowings
Splunk Employee
Splunk Employee

Please don't copy anything into $SPLUNK_HOME/etc/system/default. It will be lost (relocated) upon upgrade, and you'll be confused by the change in behavior.

The named app (CEF Extraction Utilities) isn't a one-stop shop. It has rules for breaking apart CEF events into fields, but it doesn't know your base sourcetype. The props.conf shipped with the app says "for the 'cefevents' sourcetype, REPORT on cefHeaders and cefKeys". The last two are the names of rules in transforms.conf which have (effectively) the same effect as David's rules above. Two ways to achieve the same thing.

How would you apply these rules from the app to your sourcetype? Add a "field extraction" (Settings -> Fields) for ("apply to") the sourcetype of your CEF-formatted data. Change the type to "uses transform" and enter "cefHeaders" (the name of the rule) in the "Extraction / Transform" text box. Rinse, repeat for "cefKeys" on your sourcetype, et voilà.

dshpritz
SplunkTrust
SplunkTrust

The point about not putting things in default is an important one. Thanks SOwings!

dshpritz
SplunkTrust
SplunkTrust

The "kv_mode = cef" was a suggestion for a feature in Splunk, but does not actually exist.

That same answer also points to a TA that I made for Imperva when sending CEF events:

http://apps.splunk.com/app/955/

Otherwise, you will need some regex magic. Here is a recipe I have used in the past.

First, let’s break out the message. CEF specifies a header, followed by key-value pairs, so lets do an extract to grab that. Afterwards, we use a REPORT to get the fields parsed:
props.conf:

[my_cef_sourcetype]
KV_MODE = none
EXTRACT-cef-message = \sCEF:\d\|(?<vendor>[^\|]+)\|(?<product>[^\|]+)\|(?<version>[^\|]+)\|(?<signature_id>[^\|]+)\|(?<signature>[^\|]+)\|(?<vendor_severity>[^\|]+)\|(?<cef_message>.*)
REPORT-parse_cef = cef_auto_kv,cef_first,cef_last

Our transforms.conf will then hold some of the regex magic:
transforms.conf:

[cef_auto_kv]
SOURCE_KEY = cef_message
REGEX = \s(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_first]
SOURCE_KEY = cef_message
REGEX = ^(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_last]
SOURCE_KEY = cef_message
REGEX = (\w+)=([^=]+)$
FORMAT = $1::$2

Note that this isn't a full-fledged solution, its really more of a cookbook recipe. There will generally be some cases where extra regexes will need to be used to complete some fields. Also note that this will not match the label fields with their values, as I feel that would require more processing than regex is really capable of.

HTH,

Dave

att35
Builder

Hi Dave,

Thanks for the help. This works perfectly.

We have a distributed search setup, so first when we tried just on the search head, it didn't change results. We had to install and do above steps on both, the indexer and the search head.

Now it's working great.

Thanks again,

Abhi

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...