I am playing with a custom format for data going into Splunk on Splunk 7.0, and I am trying to extract fields at index-time. I cannot use search-time extraction, so please don't ask.
When doing indexed extractions in transforms.conf, I am trying to extract the host field along with many other values in a single transformation step. There are no other transformation steps being applied besides this one.
If I try to consolidate all of the extractions, my data appears with a field called extracted_host
instead of host
. The transform has the following form (I left out details of REGEX and other fields because they are not important - all of them work as expected and none are metadata/reserved fields)
[my-custom-metrics]
KEEP_EMPTY_VALS = true
REGEX = ^...
FORMAT = ... host::$3 ...
WRITE_META = true
Everything works fine if I use a second extraction for host
and use DEST_KEY = MetaData:Host
. This will write the correct value in the host
field and not generate an extracted_host
field.
[my-custom-metrics-host]
REGEX = ^...
FORMAT = host::$1
DEST_KEY = MetaData:Host
Is there some explanation for why this would be the case? Is this documented anywhere? Does this prefixing on reserved/metadata fields hold true when using WRITE_META = true
?
Hi @rjthibod,
WRITE_META
Defaults to false and It is required for all index-time field extractions except for those where DEST_KEY = _meta
.
In your first case, with WRITE_META = true
, Its automatically writing your REGEX to metadata which is creating new field (extracted_host
).
Whereas in your second case, DEST_KEY = MetaData:Host
is used for overriding the HOST value to what has been extracted by REGEX. So extracted value will be imposed on existing HOST field and no new field will be created.
Since WRITE_META is not defined here so it will default to false. ie. WRITE_META = false
As you asked, here is the documentation about the same:
Let me know if you have any further question/doubt OR if this answers your question, please accept this as Answer.
Thank you. -Saurabh
Hi @rjthibod,
Have you looked at this documentation http://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf ?
Specially this one
REGEX and the FORMAT attribute:
* Name-capturing groups in the REGEX are extracted directly to fields.
This means that you do not need to specify the FORMAT attribute for
simple field extraction cases (see the description of FORMAT, below).
* If the REGEX extracts both the field name and its corresponding field
value, you can use the following special capturing groups if you want to
skip specifying the mapping in FORMAT:
_KEY_<string>, _VAL_<string>.
* For example, the following are equivalent:
* Using FORMAT:
* REGEX = ([a-z]+)=([a-z]+)
* FORMAT = $1::$2
* Without using FORMAT
* REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
* When using either of the above formats, in a search-time extraction,
the regex will continue to match against the source text, extracting
as many fields as can be identified in the source text.
So are you giving field name in your REGEX
?
Hi @rjthibod,
WRITE_META
Defaults to false and It is required for all index-time field extractions except for those where DEST_KEY = _meta
.
In your first case, with WRITE_META = true
, Its automatically writing your REGEX to metadata which is creating new field (extracted_host
).
Whereas in your second case, DEST_KEY = MetaData:Host
is used for overriding the HOST value to what has been extracted by REGEX. So extracted value will be imposed on existing HOST field and no new field will be created.
Since WRITE_META is not defined here so it will default to false. ie. WRITE_META = false
As you asked, here is the documentation about the same:
Let me know if you have any further question/doubt OR if this answers your question, please accept this as Answer.
Thank you. -Saurabh
@rjthibod Also, your first case is custom field extraction at index time, since you are setting WRITE_META = true, custom field name would be the one provided in REGEX or FORMAT attribute.