Splunk Search

How to create a regex to extract key value pairs

splunkrocks2014
Communicator

I have a data feed with CEF format. Splunk picks up the key value pairs except the value with the whitespaces, for instance, "subject=my testing" from the sample log below, Splunk only extracts "my" from "subject". I can create a custom regex, such as "src=(?P[^\s]+)\sdst=(?P[^\s]+)\sspt=(?P[^\s]+)\ssubject=(?P.+)".

Sep 19 08:26:10 host CEF:0|ESM|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232 subject=my testing

Is there an easy way to fix this issue without creating a custom regex? Thanks.

0 Karma

alemarzu
Motivator

Hi there @splunkrocks2014

Try like this.

Add this to your props.conf

REPORT-cefxtractions = cefheaders,cefvaluekeys

Add this to your transforms.conf

[cefheaders]
REGEX = CEF:\s(?<cef_version>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_prodversion>[^|]*)\|(?<cef_ruleid>[^|]*)\|(?<cef_rulename>[^|]*)\|(?<cef_severity>[^|]*)

[cefvaluekeys]
REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
REPEAT_MATCH = True
CLEAN_KEYS = 1

Hope it helps.

Tags (1)

woodcock
Esteemed Legend

Try this as a Field Extraction:

\b(c(?>6a|fp|n|s)\d+)Label=(?<_KEY_1>[^=]+)(?=\s+\w+=).*?\1=(?<_VAL_1>[^=]+)(?=\s+\w+=)

The _VAL_1 and _KEY_1 field names are very special.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you can change the format of the log file to have quotes around the values, then it can be fixed automatically in Splunk.

If you can't change the format of the log, then probably not. In that case you will have to do it with a regular expression using custom field extraction. It would be helpful to know which of the fields might have a space within the value field. There aren't any keys with spaces in the names, are there?

In the case above where it is the last field that is quite easy to do the field extraction. It the regular expression would be something like:

subject=(?P<subject>.*)$

because it comes at the end of the line. Others will be more difficult, but can be done.

If you can answer the above questions about the data, then a more definitive answer can be provided.

0 Karma

splunkrocks2014
Communicator

Hi cpetterborg, thank you very much for the quick responses. There are two issues from the events we are collecting: 1) Source is unable to put the double quotes to the value 2) the whitespace can be in any values

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If your data is going to be delivered into the log in the same order, then you can go with a regex like the following:

src=(?P<src>.*?)\s+dst=(?P<dst>.*?)\s+spt=(?P<spt>.*?)\s+subject=(?P<subject>.*)$

But if it can't be relied upon to be in that order, and no additional fields mixed in, then it becomes much more difficult, perhaps not possible. If you can depend on order of fields, though, the task is much simpler (as above).

0 Karma
Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...