Splunk Search

How to create a regex to extract key value pairs


I have a data feed with CEF format. Splunk picks up the key value pairs except the value with the whitespaces, for instance, "subject=my testing" from the sample log below, Splunk only extracts "my" from "subject". I can create a custom regex, such as "src=(?P[^\s]+)\sdst=(?P[^\s]+)\sspt=(?P[^\s]+)\ssubject=(?P.+)".

Sep 19 08:26:10 host CEF:0|ESM|threatmanager|1.0|100|worm successfully stopped|10|src= dst= spt=1232 subject=my testing

Is there an easy way to fix this issue without creating a custom regex? Thanks.

0 Karma


Hi there @splunkrocks2014

Try like this.

Add this to your props.conf

REPORT-cefxtractions = cefheaders,cefvaluekeys

Add this to your transforms.conf

REGEX = CEF:\s(?<cef_version>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_prodversion>[^|]*)\|(?<cef_ruleid>[^|]*)\|(?<cef_rulename>[^|]*)\|(?<cef_severity>[^|]*)

REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))

Hope it helps.

Tags (1)

Esteemed Legend

Try this as a Field Extraction:


The _VAL_1 and _KEY_1 field names are very special.

0 Karma


If you can change the format of the log file to have quotes around the values, then it can be fixed automatically in Splunk.

If you can't change the format of the log, then probably not. In that case you will have to do it with a regular expression using custom field extraction. It would be helpful to know which of the fields might have a space within the value field. There aren't any keys with spaces in the names, are there?

In the case above where it is the last field that is quite easy to do the field extraction. It the regular expression would be something like:


because it comes at the end of the line. Others will be more difficult, but can be done.

If you can answer the above questions about the data, then a more definitive answer can be provided.

0 Karma


Hi cpetterborg, thank you very much for the quick responses. There are two issues from the events we are collecting: 1) Source is unable to put the double quotes to the value 2) the whitespace can be in any values

0 Karma


If your data is going to be delivered into the log in the same order, then you can go with a regex like the following:


But if it can't be relied upon to be in that order, and no additional fields mixed in, then it becomes much more difficult, perhaps not possible. If you can depend on order of fields, though, the task is much simpler (as above).

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!