Getting Data In

Which props go where when indexing json?

DEAD_BEEF
Builder

I have json log files that I need to pull into my Splunk instance. They have some trash data at the beginning and end that I plan on removing with SEDCMD. My end goal is to clean up the file using SEDCMD, index properly (line break & timestamp), auto-parse as much as possible.

The logs are on a system with a UF which send to the indexers. I'm getting very confused about INDEXED_EXTRACTIONS & KV_MODE. I thought that I would use INDEXED_EXTRACTIONS on the UF props.conf, then everything else I need on my indexers, but the docs state that:

When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer...

This leads me to believe that if I use INDEXED_EXTRACTIONS on the UF, it won't apply any of the indexer props...so do I just use INDEXED_EXTRACTIONS on my indexers instead? Or does that only apply if I use one of the pretrained sourcetypes? Some answers I read said to use KV_MODE on the search heads? I'm pretty lost on this one.

I have this written up so far:

inputs.conf ON UF

[monitor://path_to_files]
index = my_json_index
sourcetype = my_custom_sourcetype

props.conf ON IDX

[my_custom_sourcetype]
disabled = false
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = (,)\{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_PREFIX = \{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_FORMAT = %FT%T.%3Q
TIME_ZONE = UTC
SEDCMD-1_del_header = s/.*\"events\":\[//g
SEDCMD-2_clean_eof = s/\(.*\)\]\}/\1/g
0 Karma

tom_frotscher
Builder

Hi!

If you want to use INDEXED_EXTRACTIONS = JSON you need to use it in the props on the UF. You do not need any other line breaking settings (in fact i think they will be ignored). But the file you want to read needs to be in the correct json syntax! As far as i remember it is a array of json objects.

If you want to do the line breaking by hand, you need to do it on the indexers as usual.

If you set INDEXED_EXTRACTIONS = JSON on the UF, do not set KV_MODE=JSON on the SH. This will extract fields at index time AND at search time, which will give you fields with duplicated values.

Greetings

Tom

DEAD_BEEF
Builder

Hi @tom_frotscher ! I think I understand it better now. If I use INDEXED_EXTRACTIONS on my UF, then that will override any props on my indexer. The problem is that my file is JSON format, but it has a non-standard header and footer that I will need to delete via SEDCMD before it's JSON "proper". The UF can't use transforms to clean that up.

Based on what you said, if I were to use INDEXED_EXTRACTIONS on my UF, it may not work because my data isn't JSON-proper (yet).

I believe the solution will be then to just do everything on the indexer (no INDEXED_EXTRACTIONS since I have my own line_breaker), then use KV_MODE=JSON on the SH. Does that solution make sense or am I off base on this?

0 Karma

acharlieh
Influencer

For what you're doing here.... I don't know that I would use INDEXED_EXTRACTIONS, but instead use KV_MODE=json on the search head, and have the line breaker settings on the indexers, but I want to put together a sample of JSON logs wrapped in an array, wrapped in an object to try out and play with before giving an answer. My fear is that INDEXED_EXTRACTIONS uses its own linebreaker, and that may work against you.... but I honestly don't know.

First some of my favorite references about props settings:

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...