Splunk Search

can I extract a field with a regexed dynamic fieldname?

Builder

Hi. I have JSON-like events that come into my indexer like this:
{foo.field1: value,
foo.field2: value,
foo.field3: value}

I would like to extract field1, field2, and field3 as individual fields. The trouble is that their order is not fixed within the event. I can't use Splunk's default json extraction on this data for long and boring reasons, so I'm trying to handle it manually in props. What I'd like to do is something like the following, to extract to a dynamic field name based on the regex:


EXTRACT-foo = \{foo\.(\w+):\s*(?<\1>[^,\}]*),foo\.(\w+):\s*(?<\2>[^,\}]*),foo\.(\w+):\s*(?<\3>[^,\}]*)\}

Unfortunately this doesn't work, and aside from not knowing what Splunk considers to be capture groups in the extraction, I'm not even sure if this syntax is legal. Is there a way to solve this without sorting the JSON beforehand?

UPDATE: For anyone who doesn't feel like reading the comment chain, the $1::$2 format in the accepted answer doesn't just stop at the first match--it goes through the entire event and does pairwise extractions for everything it matches. Since all my field-value pairs have the same format, I don't have to make a regex to match the whole event--I just need to match one pair, and the extraction automatically finds all the pairs that match. I had tried to do a full-event match with the format $1::$2 $3::$4 $5::$6, which is supposed to work, but it didn't, and Splunk support never figured out why. Anyway, the $1::$2 format is simpler and is automatically extensible if I add fields to these events in future.

1 Solution

SplunkTrust
SplunkTrust

Hi cphair,

try something like this in your transforms.conf:

REGEX  = ([a-z]+)=([a-z]+)
FORMAT = $1::$2

This will create a field name from capturing group one and the value from capturing group two.

Hope this helps ...

cheers, MuS

View solution in original post

Influencer

You can also do this using EXTRACT in props.conf. Here's an example of extracting the same field from four different places in the event:

EXTRACT-foo1 = (?i)(?<foo>[^,]+),\d+\.\d+\.\d+\.\d+,[a-f0-9]+(?:\-[^\-]*){4}
EXTRACT-foo2 = ACCESS-REQUEST,[^,]+,[^,]+,[^,]+,[^,]+,(?<foo>\w+)
EXTRACT-foo3 = ACCESS-ACCEPT,[^,]+,[^,]+,[^,]+,(?<foo>\w+)
EXTRACT-foo4 = (DHCP_REQUEST|DHCP_ACK),[^,]+,(?<foo>\w+)
0 Karma

SplunkTrust
SplunkTrust

Hi cphair,

try something like this in your transforms.conf:

REGEX  = ([a-z]+)=([a-z]+)
FORMAT = $1::$2

This will create a field name from capturing group one and the value from capturing group two.

Hope this helps ...

cheers, MuS

View solution in original post

Builder

Wouldn't that make it an index-time extraction and hurt performance? All I need is the search-time extraction.

0 Karma

Motivator

No, this would stay a search time extraction. transforms.conf just has a few more tricks up its sleeve when it comes to field extractions. See the docs on REGEX and the FORMAT attribute here.

Builder

Right, I forgot TRANSFORMS was search-time. But neither of the syntaxes specified in transforms.conf spec are working. I tried FORMAT=$1::2 $3::$4 $5::$6, and I tried the _KEY_X/_VALUE_X approach. Do I need to run three separate stanzas? That seems inefficient too.

0 Karma

Motivator

Did you add the name of your transform to a REPORT class in props.conf for your sourcetype/source/host?

transforms.conf

[myTransform]
REGEX = \{foo\.(\w+):\s*(?<\1>[^,\}]*),foo\.(\w+):\s*(?<\2>[^,\}]*),foo\.(\w+):\s*(?<\3>[^,\}]*)\}
FORMAT = $1::2 $3::$4 $5::$6

props.conf

[mySourceType]
REPORT-myUniqueClassName = myTransform

Builder

Yes, I did. It doesn't work.

0 Karma

Motivator

I didn't check your regex earlier, but I checked it now and there are some things that need to be addressed:

  1. "\" is an invalid character in a named capturing group
  2. Need to account for new lines.

I've modified the regex and it should work to capture what you're looking for. I would use the $1::$2 notation for this in transforms.conf. I don't know if there are new line characters in your event data or if you just added them for readability here. If there are no new line characters in your events, just remove the \n's from the regex below:

\{foo\.(\w+):\s*([^,\}]*),\nfoo\.(\w+):\s*([^,\}]*),\nfoo\.(\w+):\s*([^,\}]*)\}
0 Karma

Builder

1) I was trying to indicate that the name of the capturing group should match the string directly in front of it. I already knew the regex I mentioned didn't work. Your proposed format doesn't work for me either, though.
2) Events are strictly single-line.

0 Karma

Motivator

Let's make this easier. I indexed the following sample data:

{foo.field1: value, foo.field2: value, foo.field3: value}

I don't know if it matches your data or not, but going by your descriptions it is close. With the configuration I'm about to provide, Splunk extracts field1 = value field2 = value field3 = value. In transforms.conf, enter this:

[myTransform]
REGEX = foo\.([^,]+):\s+([^,\}]+)
FORMAT = $1::$2 

In props.conf, enter this:

[MySourcetype]
REPORT-myUniqueClassName = myTransform

Splunk will apply the field transform to the events as many times as there are matches for the supplied regex. In my test, it extracts the field names and values from the event and the event is now searchable by the extracted fields.

0 Karma

Motivator

Can this sort of thing be done in a rex in search?

0 Karma

Builder

@wrangler2x: Yes, but I wanted to make the fields easily available for other users without telling them to run a rex in the middle of their search.

0 Karma

Motivator

Could you give me an example of that? I tried to emulate that in search and was unsuccessful.

0 Karma