Getting Data In

Which props go where when indexing json?

DEAD_BEEF
Builder

I have json log files that I need to pull into my Splunk instance. They have some trash data at the beginning and end that I plan on removing with SEDCMD. My end goal is to clean up the file using SEDCMD, index properly (line break & timestamp), auto-parse as much as possible.

The logs are on a system with a UF which send to the indexers. I'm getting very confused about INDEXED_EXTRACTIONS & KV_MODE. I thought that I would use INDEXED_EXTRACTIONS on the UF props.conf, then everything else I need on my indexers, but the docs state that:

When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer...

This leads me to believe that if I use INDEXED_EXTRACTIONS on the UF, it won't apply any of the indexer props...so do I just use INDEXED_EXTRACTIONS on my indexers instead? Or does that only apply if I use one of the pretrained sourcetypes? Some answers I read said to use KV_MODE on the search heads? I'm pretty lost on this one.

I have this written up so far:

inputs.conf ON UF

[monitor://path_to_files]
index = my_json_index
sourcetype = my_custom_sourcetype

props.conf ON IDX

[my_custom_sourcetype]
disabled = false
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = (,)\{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_PREFIX = \{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_FORMAT = %FT%T.%3Q
TIME_ZONE = UTC
SEDCMD-1_del_header = s/.*\"events\":\[//g
SEDCMD-2_clean_eof = s/\(.*\)\]\}/\1/g
0 Karma

tom_frotscher
Builder

Hi!

If you want to use INDEXED_EXTRACTIONS = JSON you need to use it in the props on the UF. You do not need any other line breaking settings (in fact i think they will be ignored). But the file you want to read needs to be in the correct json syntax! As far as i remember it is a array of json objects.

If you want to do the line breaking by hand, you need to do it on the indexers as usual.

If you set INDEXED_EXTRACTIONS = JSON on the UF, do not set KV_MODE=JSON on the SH. This will extract fields at index time AND at search time, which will give you fields with duplicated values.

Greetings

Tom

0 Karma

DEAD_BEEF
Builder

Hi @tom_frotscher ! I think I understand it better now. If I use INDEXED_EXTRACTIONS on my UF, then that will override any props on my indexer. The problem is that my file is JSON format, but it has a non-standard header and footer that I will need to delete via SEDCMD before it's JSON "proper". The UF can't use transforms to clean that up.

Based on what you said, if I were to use INDEXED_EXTRACTIONS on my UF, it may not work because my data isn't JSON-proper (yet).

I believe the solution will be then to just do everything on the indexer (no INDEXED_EXTRACTIONS since I have my own line_breaker), then use KV_MODE=JSON on the SH. Does that solution make sense or am I off base on this?

0 Karma

acharlieh
Influencer

For what you're doing here.... I don't know that I would use INDEXED_EXTRACTIONS, but instead use KV_MODE=json on the search head, and have the line breaker settings on the indexers, but I want to put together a sample of JSON logs wrapped in an array, wrapped in an object to try out and play with before giving an answer. My fear is that INDEXED_EXTRACTIONS uses its own linebreaker, and that may work against you.... but I honestly don't know.

First some of my favorite references about props settings:

0 Karma