Getting Data In

How to extract host field with many other fields in a single transformation step (DEST_KEY versus WRITE_META)

rjthibod
Champion

I am playing with a custom format for data going into Splunk on Splunk 7.0, and I am trying to extract fields at index-time. I cannot use search-time extraction, so please don't ask.

When doing indexed extractions in transforms.conf, I am trying to extract the host field along with many other values in a single transformation step. There are no other transformation steps being applied besides this one.

If I try to consolidate all of the extractions, my data appears with a field called extracted_host instead of host. The transform has the following form (I left out details of REGEX and other fields because they are not important - all of them work as expected and none are metadata/reserved fields)

[my-custom-metrics]
KEEP_EMPTY_VALS = true
REGEX        = ^...
FORMAT       = ... host::$3 ...
WRITE_META   = true

Everything works fine if I use a second extraction for host and use DEST_KEY = MetaData:Host. This will write the correct value in the host field and not generate an extracted_host field.

[my-custom-metrics-host]
REGEX      = ^...
FORMAT     = host::$1
DEST_KEY   = MetaData:Host

Is there some explanation for why this would be the case? Is this documented anywhere? Does this prefixing on reserved/metadata fields hold true when using WRITE_META = true?

0 Karma
1 Solution

saurabh_tek11
Communicator

Hi @rjthibod,

WRITE_META Defaults to false and It is required for all index-time field extractions except for those where DEST_KEY = _meta.
In your first case, with WRITE_META = true, Its automatically writing your REGEX to metadata which is creating new field (extracted_host).

Whereas in your second case, DEST_KEY = MetaData:Host is used for overriding the HOST value to what has been extracted by REGEX. So extracted value will be imposed on existing HOST field and no new field will be created.

Since WRITE_META is not defined here so it will default to false. ie. WRITE_META = false

As you asked, here is the documentation about the same:

Let me know if you have any further question/doubt OR if this answers your question, please accept this as Answer.
Thank you. -Saurabh

View solution in original post

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi @rjthibod,

Have you looked at this documentation http://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf ?

Specially this one

REGEX and the FORMAT attribute:
  * Name-capturing groups in the REGEX are extracted directly to fields.
    This means that you do not need to specify the FORMAT attribute for
    simple field extraction cases (see the description of FORMAT, below).
  * If the REGEX extracts both the field name and its corresponding field
    value, you can use the following special capturing groups if you want to
    skip specifying the mapping in FORMAT:
      _KEY_<string>, _VAL_<string>.
  * For example, the following are equivalent:
    * Using FORMAT:
      * REGEX  = ([a-z]+)=([a-z]+)
      * FORMAT = $1::$2
    * Without using FORMAT
      * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    * When using either of the above formats, in a search-time extraction,
      the regex will continue to match against the source text, extracting
      as many fields as can be identified in the source text.

So are you giving field name in your REGEX ?

0 Karma

saurabh_tek11
Communicator

Hi @rjthibod,

WRITE_META Defaults to false and It is required for all index-time field extractions except for those where DEST_KEY = _meta.
In your first case, with WRITE_META = true, Its automatically writing your REGEX to metadata which is creating new field (extracted_host).

Whereas in your second case, DEST_KEY = MetaData:Host is used for overriding the HOST value to what has been extracted by REGEX. So extracted value will be imposed on existing HOST field and no new field will be created.

Since WRITE_META is not defined here so it will default to false. ie. WRITE_META = false

As you asked, here is the documentation about the same:

Let me know if you have any further question/doubt OR if this answers your question, please accept this as Answer.
Thank you. -Saurabh

View solution in original post

0 Karma

saurabh_tek11
Communicator

@rjthibod Also, your first case is custom field extraction at index time, since you are setting WRITE_META = true, custom field name would be the one provided in REGEX or FORMAT attribute.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!