Solved: Re: Field Transform not outputting any data

sidafydd · ‎05-21-2010

Hi,

I've created the following field extraction and field transform in their respective files - props.conf and transforms.conf:

[convert_mac_address_from_msdhcp]
FORMAT = client_mac_address::$1-$2-$3-$4-$5-$6
REGEX = ,(\w{2})(\w{2})(\w{2})(\w{2})(\w{2})(\w{2}),$

[msdhcp]
REPORT-client_mac_address = convert_mac_address_from_msdhcp

When doing a search, 'client_mac_address' appears but with a value of '$1-$2-$3-$4-$5-$6' and not, for example, '00-56-89-23-44-22' as I would expect. If I change the FORMAT line in the transform to the following:

FORMAT = client_mac_address::$1

I get the first two alphanumeric characters returned as expected i.e 07, so why does it break if using multiple $n?

Can anyone tell me why this is not working?

Lowell · ‎05-21-2010

You can't use multiple regex capture groups for a single field with Splunk's field-extraction.

I'm not sure that this limitation is clearly documented. If so, does anybody have a link?

For example: FORMAT = my_var::$1_$2 does not work because you are referencing two groups. Also, you can't use something like FORMAT = my_var::a_constant_string-$1. You can only reference one group at a time with field-extraction (search time) and can't use text to augment your value either. Now, you can use multiple groups with an indexed field (which, as it sounds, is handled at index time), however there are many other downsides to indexed field that make this less than ideal. So I don't recommend that using them without some serious consideration and a good understanding of their pros/cons.

Here are two discussions on this topic that I think you will find helpful:

transforming an ip -- This one is the most related to your question. Some of the alternate approaches mentioned could be adapted to work in your situation.
Do search-time fields have performance considerations? -- This has some helpful pros/cons about indexed fields vs extracted fields.

Here is another search-time workaround that could try:

... | rex ",(?<client_mac_address>\d{12}),$" | rex mode=sed field=client_mac_address "s/(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/\1-\2-\3-\4-\5-\6/"

Another option would be to use a SEDCMD index-time transformation to do this at index time.

Also, I notice that you're looking for commas before and after your MAC address. If you have a CSV-style file than you can use delimited field extraction options. Look for FIELDS and DELIMS in the transforms.conf docs.

View solution in original post

Lowell · ‎05-21-2010

You can't use multiple regex capture groups for a single field with Splunk's field-extraction.

I'm not sure that this limitation is clearly documented. If so, does anybody have a link?

For example: FORMAT = my_var::$1_$2 does not work because you are referencing two groups. Also, you can't use something like FORMAT = my_var::a_constant_string-$1. You can only reference one group at a time with field-extraction (search time) and can't use text to augment your value either. Now, you can use multiple groups with an indexed field (which, as it sounds, is handled at index time), however there are many other downsides to indexed field that make this less than ideal. So I don't recommend that using them without some serious consideration and a good understanding of their pros/cons.

Here are two discussions on this topic that I think you will find helpful:

transforming an ip -- This one is the most related to your question. Some of the alternate approaches mentioned could be adapted to work in your situation.
Do search-time fields have performance considerations? -- This has some helpful pros/cons about indexed fields vs extracted fields.

Here is another search-time workaround that could try:

... | rex ",(?<client_mac_address>\d{12}),$" | rex mode=sed field=client_mac_address "s/(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/\1-\2-\3-\4-\5-\6/"

Another option would be to use a SEDCMD index-time transformation to do this at index time.

Also, I notice that you're looking for commas before and after your MAC address. If you have a CSV-style file than you can use delimited field extraction options. Look for FIELDS and DELIMS in the transforms.conf docs.

sidafydd · ‎05-24-2010

Thanks for the information. I had read http://www.splunk.com/base/Documentation/latest/Knowledge/Managefieldtransforms and had assumed that it was applicable to search-time extractions as well.

gkanapathy · ‎05-22-2010

Just to clarify, this restriction on multiple fields in a value is for search-time extractions. You can use multiple in index-time transforms.

It's not well-documented, but it is a consequence of how search-time extractions work currently.

Field Transform not outputting any data

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!