Solved: Re: unable to extract multi word values from CEF f...

att35 · ‎08-27-2014

Hi,

We have Imperva logs coming into splunk as CEF via syslog. We ran into the issue of Splunk only extracting first word, from the multi-word value's usually given in CEF format.

I came across the App "CEF (Common Event Format) Extraction Utilities". http://apps.splunk.com/app/487/

Installed the app, but the extractions are still the same. I believe there must be a way to make sure app features are integrated in the regular splunk search but not sure what I am missing?

On the app main page, it mentions the following:

cefKeys - fixes multiword value extraction (by default Splunk would only extract key's values up to the first whitespace character)

This is exactly what we want, but not sure how to make it work.

In another Splunk answer(http://answers.splunk.com/answers/140326/cef-parsing-using-custom-field-labels-and-the-cefutils-app), it was suggested to set KV_mode=cef. Did that on props.conf within the app, but that didn't change anything.

Am I supposed to copy the props.conf from app, into props.conf under /opt/splunk/etc/system/default?

Looks like I am missing something very basic here.. any help would be appreciated..

Thanks,

~Abhi

dshpritz · ‎08-27-2014

The "kv_mode = cef" was a suggestion for a feature in Splunk, but does not actually exist.

That same answer also points to a TA that I made for Imperva when sending CEF events:

http://apps.splunk.com/app/955/

Otherwise, you will need some regex magic. Here is a recipe I have used in the past.

First, let’s break out the message. CEF specifies a header, followed by key-value pairs, so lets do an extract to grab that. Afterwards, we use a REPORT to get the fields parsed:
props.conf:

[my_cef_sourcetype]
KV_MODE = none
EXTRACT-cef-message = \sCEF:\d\|(?<vendor>[^\|]+)\|(?<product>[^\|]+)\|(?<version>[^\|]+)\|(?<signature_id>[^\|]+)\|(?<signature>[^\|]+)\|(?<vendor_severity>[^\|]+)\|(?<cef_message>.*)
REPORT-parse_cef = cef_auto_kv,cef_first,cef_last

Our transforms.conf will then hold some of the regex magic:
transforms.conf:

[cef_auto_kv]
SOURCE_KEY = cef_message
REGEX = \s(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_first]
SOURCE_KEY = cef_message
REGEX = ^(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_last]
SOURCE_KEY = cef_message
REGEX = (\w+)=([^=]+)$
FORMAT = $1::$2

Note that this isn't a full-fledged solution, its really more of a cookbook recipe. There will generally be some cases where extra regexes will need to be used to complete some fields. Also note that this will not match the label fields with their values, as I feel that would require more processing than regex is really capable of.

HTH,

Dave

View solution in original post

sowings · ‎08-29-2014

Please don't copy anything into $SPLUNK_HOME/etc/system/default. It will be lost (relocated) upon upgrade, and you'll be confused by the change in behavior.

The named app (CEF Extraction Utilities) isn't a one-stop shop. It has rules for breaking apart CEF events into fields, but it doesn't know your base sourcetype. The props.conf shipped with the app says "for the 'cefevents' sourcetype, REPORT on cefHeaders and cefKeys". The last two are the names of rules in transforms.conf which have (effectively) the same effect as David's rules above. Two ways to achieve the same thing.

How would you apply these rules from the app to your sourcetype? Add a "field extraction" (Settings -> Fields) for ("apply to") the sourcetype of your CEF-formatted data. Change the type to "uses transform" and enter "cefHeaders" (the name of the rule) in the "Extraction / Transform" text box. Rinse, repeat for "cefKeys" on your sourcetype, et voilà.

dshpritz · ‎08-29-2014

The point about not putting things in default is an important one. Thanks SOwings!

dshpritz · ‎08-27-2014

The "kv_mode = cef" was a suggestion for a feature in Splunk, but does not actually exist.

That same answer also points to a TA that I made for Imperva when sending CEF events:

http://apps.splunk.com/app/955/

Otherwise, you will need some regex magic. Here is a recipe I have used in the past.

First, let’s break out the message. CEF specifies a header, followed by key-value pairs, so lets do an extract to grab that. Afterwards, we use a REPORT to get the fields parsed:
props.conf:

[my_cef_sourcetype]
KV_MODE = none
EXTRACT-cef-message = \sCEF:\d\|(?<vendor>[^\|]+)\|(?<product>[^\|]+)\|(?<version>[^\|]+)\|(?<signature_id>[^\|]+)\|(?<signature>[^\|]+)\|(?<vendor_severity>[^\|]+)\|(?<cef_message>.*)
REPORT-parse_cef = cef_auto_kv,cef_first,cef_last

Our transforms.conf will then hold some of the regex magic:
transforms.conf:

[cef_auto_kv]
SOURCE_KEY = cef_message
REGEX = \s(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_first]
SOURCE_KEY = cef_message
REGEX = ^(\w+)=([^=]+)(?=\s+\w+=)
FORMAT = $1::$2

[cef_last]
SOURCE_KEY = cef_message
REGEX = (\w+)=([^=]+)$
FORMAT = $1::$2

Note that this isn't a full-fledged solution, its really more of a cookbook recipe. There will generally be some cases where extra regexes will need to be used to complete some fields. Also note that this will not match the label fields with their values, as I feel that would require more processing than regex is really capable of.

HTH,

Dave

att35 · ‎08-28-2014

Hi Dave,

Thanks for the help. This works perfectly.

We have a distributed search setup, so first when we tried just on the search head, it didn't change results. We had to install and do above steps on both, the indexer and the search head.

Now it's working great.

Thanks again,

Abhi

unable to extract multi word values from CEF fields

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

unable to extract multi word values from CEF fields

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits