Splunk Search
Highlighted

In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

I'm noticing some weird behavior in a search that is requiring me to inline some regexs in order to get the MR job to work.

Step 0: Create a field extraction in an app that is not search

Here are the relevant contents of

$HUNK_HOME/etc/apps/{non_searchapp_app}/local/props.conf:

[myvix_sourcetype]
EXTRACT-myField = ^(?:[^\|\n]*\|){6}(?<my_field>[^\|]+)

Step 1: Verify Field Extraction works

Example Search: (Smart Mode)

 index=myvix source=*events*
  • Indeed, on the left hand side I see myfield_ is recognized and has events being counted for each unique value of myfield_
  • Hunk auto-field detection is indeed working

Step 2: Now check to see the field is being extracted by the search

Example Search: (Smart Mode)

 index=myvix source=*events* | table _time, my_field

I get the following results:

 _time                my_field
 2015-05-26 16:19:57     
 2015-05-26 16:19:57      
 ...

Known Workaround

Inline the rex and don't rely on the field extraction in props.conf.

 index=myvix source=*events* | rex field=message "^(?:[^\|\n]*\|){6}(?<my_field>[^\|]+)" | table _time, my_field

results in the following:

 _time                  my_field
 2015-05-26 16:19:57    my_field_value-A
 2015-05-26 16:19:57    my_field_value-B

Interesting corollary:

Inlining the following regex (e.g. field=raw) **does not work_**!!!

 index=myvix source=*events* | rex field=_raw "^(?:[^\|\n]*\|){6}(?<my_field>[^\|]+)" | table _time, my_field, _raw

results:

 _time                  my_field                  _raw
 2015-05-26 16:19:57                            {"header": {"time": 1432675197252, "threadId": "qtpXXXX", "requestMarker": "abadbeef42c8", "env": "production", "server": "some-prod-server", "service": "some-service"}}
 2015-05-26 16:19:57                            {"header": {"time": 1432675197253, "threadId": "qtpYYYY", "requestMarker": "8badbeef9139", "env": "production", "server": "some-otherprod-server", "service": "some-other-service"}}

Notice that _raw doesn't work because the 'message' field of the _raw avro record is not being included. Only the 'header' field is being included.

FWIW, the regex was generated using the "Event Action -> Extract Fields" UI from the main search view.


Interesting corollary++:

And as one last attempt to self-service and figure this out, I added message to the table command.

and it works!! Go figure.

 index=myvix source=*events* | rex field=_raw "^(?:[^\|\n]*\|){6}(?<my_field>[^\|]+)" | table _time, my_field, _raw, message

results:

 _time                  my_field            _raw                  message
 2015-05-26 16:19:57    my_field_value-A   {"header": {"time": 1432675197252, "threadId": "qtpXXXX", "requestMarker": "abadbeef42c8", "env": "production", "server": "some-prod-server", "service": "some-service"}, "message": "t.blah.X.blah.blah.blah - |x|xxx|xxx|xxxx|xxx-xxxx|my_field_value-A|xxxx|x|x|blah&blah&blah|xxx/xxx|x|x|"}   t.blah.X.blah.blah.blah - |x|xxx|xxx|xxxx|xxx-xxxx|my_field_value-A|xxxx|x|x|blah&blah&blah|xxx/xxx|x|x|
 2015-05-26 16:19:57    my_field_value-B   {"header": {"time": 1432675197253, "threadId": "qtpYYYY", "requestMarker": "8badbeef9139", "env": "production", "server": "some-otherprod-server", "service": "some-other-service"}, "message": "t.blah.X.blah.blah.blah - |x|xxx|xxx|xxxx|xxx-xxxx|my_field_value-B|xxxx|x|x|blah&blah&blah|xxx/xxx|x|x|"}   t.blah.X.blah.blah.blah - |x|xxx|xxx|xxxx|xxx-xxxx|my_field_value-B|xxxx|x|x|blah&blah&blah|xxx/xxx|x|x|

So it seems I have to tell hunk ahead of time which "raw fields" to include then it will "auto extract" ?

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Splunk Employee
Splunk Employee

In the props.conf do you have your HDFS directory?

[source::/user/hunk/data/England/...]
sourcetype = England
EXTRACT-myField = XYZ

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

In $HUNK_HOME/etc/system/local/props.conf (note: that's system/local not apps/{non_searchapp_app}/local😞

 [myvix_sourcetype]
 EVAL-_time = strptime('header.time', "%s%3N")
 TRUNCATE = 102400
 MAX_TIMESTAMP_LOOKAHEAD = 30

 [source::/user/hunkuser/data/...]
 sourcetype = myvix_sourcetype

In $HUNK_HOME/etc/apps/{non_searchapp_app}/local/props.conf:

 [myvix_sourcetype]
 EXTRACT-myField = ^(?:[^\|\n]*\|){6}(?≺my_field≻[^\|]+)
0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Splunk Employee
Splunk Employee

I'd also recommend revising the time extraction rule based on this answer - eval based timestamp extraction causes time based partition pruning to be disabled.

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

@Ledion thanks for pointing that out. I had actually read that answer and always focused on the RHS (e.g. the "%s%3N") and not the LHS (e.g. EXTRACT-time vs EVAL-time). I'll investigate and report back.

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

@Ledion, going with this:

 [myvix_sourcetype]
 #EVAL-_time = strptime('header.time', "%s%3N")
 #EXTRACT-_time = strptime('header.time', "%s%3N")
 TRUNCATE = 102400
 TIME_PREFIX = "time":[ ]
 TIME_FORMAT = %3N
 MAX_TIMESTAMP_LOOKAHEAD = 40
0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Splunk Employee
Splunk Employee

Two more things:
a) make sure to add header.time in the required fields for the vix
b) you'd need to fix TIME_FORMAT, probably need "%s%3N" (or maybe that's what you have and it doesn't render right here)

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

Yup.
- Already had header.time as required fields for the vix.
- Missed the %s... added it

0 Karma
Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Splunk Employee
Splunk Employee

Ahh, the "corollary" and "corollary++" are actually very important in what you're experiencing - basically what is happening is that Hunk does not have any knowledge that the field is being extracted from the "message" field and therefore the Avro reader doesn't output it - thus the extraction fail. Why does it work when you run "index=vix source=events" ? Well, if you're not running a reporting search (stats, timechart etc) the search is effectively ran in "verbose mode"

There are two ways to fix this:
a) if there are some fields that you always need some fields you can tell the record readers to always output them - check this answer for how to do that

b) you can tell the extractor that the field is actually being extracted from another field by modifying the extraction rule as follows:

 [myvix_sourcetype]
EXTRACT-myField = ^(?:[^\|\n]*\|){6}(?<my_field>[^\|]+) IN message 

Unfortunately both methods require you to edit .conf files.

View solution in original post

Highlighted

Re: In Hunk, app-specific field extraction is not picked up by map-reduce jobs

Explorer

Since the original developer used the UI to create the regex, it would be great if the UI could infer that message is required. It severely limits what end users can do for "schema-on-read" use-cases.... requiring a ticket for each field-extraction for the admin to go in and edit.

I tried both approaches and both worked, as advertised.

Since this is specific to the {nonsearchappapp} and since I only need it to pull in that field when it needs to I went with b).

 [myvix_sourcetype]
 EXTRACT-myField = ^(?:[^\|\n]*\|){6}(?≺my_field≻[^\|]+) in message

It worked like a charm! Thanks @Ledion once again!