Hi,
I've created the following field extraction and field transform in their respective files - props.conf and transforms.conf:
[convert_mac_address_from_msdhcp]
FORMAT = client_mac_address::$1-$2-$3-$4-$5-$6
REGEX = ,(\w{2})(\w{2})(\w{2})(\w{2})(\w{2})(\w{2}),$
[msdhcp]
REPORT-client_mac_address = convert_mac_address_from_msdhcp
When doing a search, 'client_mac_address' appears but with a value of '$1-$2-$3-$4-$5-$6' and not, for example, '00-56-89-23-44-22' as I would expect. If I change the FORMAT line in the transform to the following:
FORMAT = client_mac_address::$1
I get the first two alphanumeric characters returned as expected i.e 07, so why does it break if using multiple $n?
Can anyone tell me why this is not working?
You can't use multiple regex capture groups for a single field with Splunk's field-extraction.
I'm not sure that this limitation is clearly documented. If so, does anybody have a link?
For example: FORMAT = my_var::$1_$2
does not work because you are referencing two groups. Also, you can't use something like FORMAT = my_var::a_constant_string-$1
. You can only reference one group at a time with field-extraction (search time) and can't use text to augment your value either. Now, you can use multiple groups with an indexed field (which, as it sounds, is handled at index time), however there are many other downsides to indexed field that make this less than ideal. So I don't recommend that using them without some serious consideration and a good understanding of their pros/cons.
Here are two discussions on this topic that I think you will find helpful:
Here is another search-time workaround that could try:
... | rex ",(?<client_mac_address>\d{12}),$" | rex mode=sed field=client_mac_address "s/(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/\1-\2-\3-\4-\5-\6/"
Another option would be to use a SEDCMD
index-time transformation to do this at index time.
Also, I notice that you're looking for commas before and after your MAC address. If you have a CSV-style file than you can use delimited field extraction options. Look for FIELDS
and DELIMS
in the transforms.conf
docs.
You can't use multiple regex capture groups for a single field with Splunk's field-extraction.
I'm not sure that this limitation is clearly documented. If so, does anybody have a link?
For example: FORMAT = my_var::$1_$2
does not work because you are referencing two groups. Also, you can't use something like FORMAT = my_var::a_constant_string-$1
. You can only reference one group at a time with field-extraction (search time) and can't use text to augment your value either. Now, you can use multiple groups with an indexed field (which, as it sounds, is handled at index time), however there are many other downsides to indexed field that make this less than ideal. So I don't recommend that using them without some serious consideration and a good understanding of their pros/cons.
Here are two discussions on this topic that I think you will find helpful:
Here is another search-time workaround that could try:
... | rex ",(?<client_mac_address>\d{12}),$" | rex mode=sed field=client_mac_address "s/(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/\1-\2-\3-\4-\5-\6/"
Another option would be to use a SEDCMD
index-time transformation to do this at index time.
Also, I notice that you're looking for commas before and after your MAC address. If you have a CSV-style file than you can use delimited field extraction options. Look for FIELDS
and DELIMS
in the transforms.conf
docs.
Thanks for the information. I had read http://www.splunk.com/base/Documentation/latest/Knowledge/Managefieldtransforms and had assumed that it was applicable to search-time extractions as well.
Just to clarify, this restriction on multiple fields in a value is for search-time extractions. You can use multiple in index-time transforms.
It's not well-documented, but it is a consequence of how search-time extractions work currently.