Deployment Architecture

indexed extraction not working properly for json on a windows uf

salem34
Path Finder

Hi Ninjas

I have two different json logs which looks like this:

{"version":"1.1","host":"t800.skynet.com","short_message":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb \r\n","full_message":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb \r\n","timestamp":1484920098.408,"level":4,"_app":"skynet","_level_name":"WARN","_mdcKeybIwMa":"mdcValueNYUGJgYJaTFaWcdicara","_thread_name":"sample","_logger_name":"common.log.json.LogFileProducer","_env":"ut"}

and the other one looks like this:

{"timestamp":"2017-01-20 14:48:18.428","level":"DEBUG","thread":"sample","logger":"common.log.json.LogFileProducer","msg":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb","mdc":{"mdcKeybIwMa":"mdcValueNYUGJgYJaTFaWcdicara"}}

Those files are located on a windows machine where i have installed a Universal Forwarder(6.5x) with the following settings:

props.conf:
[my_json_sourcetype]
INDEXED_EXTRACTIONS = json
TIMESTAMP_FIELDS = timestamp

And a poper inputs.conf which simply points to the appropriate directory using monitor and applying the sourcetype "my_json_sourcetype".

Now the problem which is driving me nuts:
The Events come in in correct json format on the indexer, but searching on them shows that not all the fields are extracted, such as all the "mdc" keyvalue fields are ignored completely.

Now the fact which is driving me even more nuts:
Uploading the exact same log file by using the add data feature directly on my linux indexer with the same sourcetype (copied the props from the forwarder as descibed above) works perfect and extracts all the fields absolutely perfect - including the mdc key value fields.

The forwarder or indexer logs show absolutely nothing in that case, everything seems to work "fine".

Any ideas why the forwarder handles the data differently and why not all the fields are extracted? And why it works with the same settings on my indexer? Did i miss a setting concerning indexed_extractions on the uf?

Any help is appreciated, thx

Cheers
Salem

Tags (1)
0 Karma

salem34
Path Finder

OK its getting stranger and stranger-
I just discovered that:
It has indexed everything properly (with the mdc fields) BUT i do not see them under the event BUT can search for them?! So if you look at my events from that sourcetype i see all the extractions EXCEPT the mdc kv fields. If i look at a particular Event (search for only that event) it shows the mdc fields as it should?!

See here:
Particular Event in the list of all events:

alt text

No mdc kv fields are here

Now just looking for that particular event in the search by adding the message as a search term (which is unique):

alt text

There we go, mdc kv fields are here

Its the same event, same source, same sourcetype -- what am i missing here? Is it some kind of limit which causing all the headache?

0 Karma

mattymo
Splunk Employee
Splunk Employee

yeah, at least in my example, the mdc is being extracted by the syntax highlighting...

If I search verbose mode, it gets the mdc...and if I click on that value and add to search , it actually pipes it thru spath. So you were right, my previous examples were search time...but they weren't KV_MODE as I had that off....

It looks like the gui does auto extract of JSON with SPATH in the search results. I tried turning syntax highlighting off...will try and find out more about the voodoo at play here...

When you click the mdc and 'add to search' is it piping thru spath for you?

- MattyMo
0 Karma

salem34
Path Finder

all this looks quite tricky to me - i think the best way (or most reliable) is going with kv_mode=json and do it all at search time as this looks good with my logs. To your question: Im only able to click to an mdc value if i search for a particular event, otherwise i do not got the mdc fields at all. But yes from that single event clicking on the value of an mdc field will add it correct

0 Karma

mattymo
Splunk Employee
Splunk Employee

right, but when you add it...does it add | spath ? or does it add it as a kv pair?

when I add to search, my search becomes:

index=n00blab sourcetype="answers_json" host="n00b-noah-01"| spath "mdc.mdcKeybIwMa" | search "mdc.mdcKeybIwMa"=mdcValueNYUGJgYJaTFaWcdicara

- MattyMo
0 Karma

salem34
Path Finder

Ok sorry i missed that - it becomes a kv pair as it should - so my search
index="skynet" sourcetype="custom_json_indexed" Msg GvuLnDhEXexLaaHmMHSaqQJyozYtFJAarNkkridVbsUHmcZagLNwPrsVvycpKoGsohXgnzyvAbrWLaZalFIasamdiPJwikfEBZraMpugIJaShabvEaidNRJakYjdeEIVKWMvqJDoAIJQcVhgKaISSVLYojjRaHNaSJcywvaaaYsaaassiVtmdBGWlqBEzGtHmqaaVk

become

index="skynet" sourcetype="custom_json_indexed" Msg GvuLnDhEXexLaaHmMHSaqQJyozYtFJAarNkkridVbsUHmcZagLNwPrsVvycpKoGsohXgnzyvAbrWLaZalFIasamdiPJwikfEBZraMpugIJaShabvEaidNRJakYjdeEIVKWMvqJDoAIJQcVhgKaISSVLYojjRaHNaSJcywvaaaYsaaassiVtmdBGWlqBEzGtHmqaaVk "mdc.mdcKeyaippQ"=mdcValueuvCvVphbipWgKdaagITQ

0 Karma

mattymo
Splunk Employee
Splunk Employee

So on the hunch that the underscores were your hurdle, I took your file and removed the underscores, then set up a file monitor on a windows 10 UF running 6.5.2.

 {"version":"1.1","host":"t800.skynet.com","short_message":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb \r\n","full_message":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb \r\n","timestamp":1484920098.408,"level":4,"app":"skynet","level_name":"WARN","mdcKeybIwMa":"mdcValueNYUGJgYJaTFaWcdicara","thread_name":"sample","logger_name":"common.log.json.LogFileProducer","env":"ut"}
 {"timestamp":"2017-01-20 14:48:18.428","level":"DEBUG","thread":"sample","logger":"common.log.json.LogFileProducer","msg":"Msg TzjTJPqvUaGqaOpHXFdKXyXVaHiLpbTKfhePqbEtammLeaZVaaTb","mdc":{"mdcKeybIwMa":"mdcValueNYUGJgYJaTFaWcdicara"}}

inputs on the UF was :

[your input]
disabled=false
index=n00blab
sourcetype=_json

alt text

And now all fields are parsed.....So one option here is to pre-parse or re-configure logging source to remove the leading underscores to allow index time field creation..

Obviously easy for me to say with 2 events...

I also got it to work by setting both indexed_extractions= json AND KV_MODE=json.

Bit of a hacky workaround, but it works...

[salem34json]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=json
KV_MODE=json
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
TIMESTAMP_FIELDS=timestamp
category=Custom
description=indexed_extracted json for eventbreak and timestamp, kv_mode json for kvpairs to workaround underscores
disabled=false
pulldown_type=true

I decided to keep the indexed_extractions=json to enable us to declare the timestamp field for timestamp extraction, which you need here because you have multiple time formats....

then a KV_MODE=json props on your SH to pull your key value pairs to workaround the leading underscores that are messing with those fields at index time...

alt text

- MattyMo
0 Karma

salem34
Path Finder

First things firts, i really appreciate your effort here! Well have a beer at a conf or so;-) However a few points to your solution:

I dont want to use kv mode as i want to benefit from the search performance with the indexed extractions.
I might be wrong but on your first sample with the default _json sourcetypes also enbles kv_mode = json on the indexer/search head. As there is no props on the fwd the first sample shows search times extractions only.

I know that enabling kv mode in addition would help but on the other hand it results in double extractions, one indexed and on during search time which isn't quite sexy

Ok we forget the first event with all the _ characters, but as i understand your windows forwarder also did not extract the mdc fields from the second, properly formated, json event without kv mode, correct?

If yes, why the hell? And what still gives me headache ist the fact that it works like a charm on my linux indexer

0 Karma

mattymo
Splunk Employee
Splunk Employee

so on your working example, if you search fast mode, do you see the json fields extracted?

- MattyMo
0 Karma

mattymo
Splunk Employee
Splunk Employee

Hmm, to start, maybe try removing the props from the UF and simply keep the inputs that explicitly sets your json sourcetype? That should be similar to the test you did, in that you let the indexer do the extractions. ...and do it at searchtime?

If that gets you rocking, then we can chase of whether this is a windows uf thing, or just a uf thing...

What version of windows are you dealing with?

What is the input on the UF? tailing a file?
EDIT: AH "inputs.conf which simply points to the appropriate directory using monitor and applying the sourcetype "my_json_sourcetype".

Did you create your sourcetype by hand? Try running the data through the add data wizard and building on top of the stock _json sourcetype on you idx or using the app builder. Then apply that props to the UF?

Like you said, my 6.5.2 Linux Indexer using upload via browser from a Mac running Sierra (with is batch proc IIRC.), ate this no problem.

- MattyMo
0 Karma

salem34
Path Finder

Thx for your thoughts.
Removing the props from the forwarder will result in no extractions at all as indexed_extractions had to be configured on the source machine. I know this from personal experiences and it shoulb be in the docs like that as well.
Its my local windows 10 machine which sends the sample data to my virtual linux indexer.

I created the sourcetype using the preview in the add data section so this should not be the problem. It woul even not explain why the same sourcetype works on the indexer but not on the forwarder.
Could this be a windows thing?

Another thing: Your screenshot shows the mdc kv fields yes, but where are app, env etc. from the first event?

Its kinda frustrating as im running out ideas.

0 Karma

mattymo
Splunk Employee
Splunk Employee

You're right, the indexed_extractions should be on the UF if you are going to use indextime, but if the props is not working that is not the only option for json extraction. (see Spath or KV_MODE=json)

Anyhow, like you said, it should work, and I have a win10 box so ill try it shortly.

As for the fields you expect, I think it may be due to the fields starting with an underscore...dont think that will fly...will get it working and share the results.

Don't let it get to ya, we'll get ya sorted...but u'll likely need searchtime

- MattyMo
0 Karma

mattymo
Splunk Employee
Splunk Employee

Field name syntax restrictions
You can assign field names as follows:

Valid characters for field names are a-z, A-Z, 0-9, or _ .
Field names cannot begin with 0-9 or _ . Splunk reserves leading underscores for its internal variables.
Avoid assigning field names that match any of the default field names.
Do not assign field names that contain international characters.

http://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Configureindex-timefieldextraction

- MattyMo
0 Karma

ebaileytu
Communicator

I am having almost the exact same issue. Add data feature will extract the data correctly, but when I ingest the file using a input all the fields are not extracting. It seems to cut off around 500 fields with no error messages. Has been a frustrating issue.

0 Karma

mattymo
Splunk Employee
Splunk Employee

your issue sounds more like limits..

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...