Getting Data In

Which props go where when indexing json?


I have json log files that I need to pull into my Splunk instance. They have some trash data at the beginning and end that I plan on removing with SEDCMD. My end goal is to clean up the file using SEDCMD, index properly (line break & timestamp), auto-parse as much as possible.

The logs are on a system with a UF which send to the indexers. I'm getting very confused about INDEXED_EXTRACTIONS & KV_MODE. I thought that I would use INDEXED_EXTRACTIONS on the UF props.conf, then everything else I need on my indexers, but the docs state that:

When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer...

This leads me to believe that if I use INDEXED_EXTRACTIONS on the UF, it won't apply any of the indexer do I just use INDEXED_EXTRACTIONS on my indexers instead? Or does that only apply if I use one of the pretrained sourcetypes? Some answers I read said to use KV_MODE on the search heads? I'm pretty lost on this one.

I have this written up so far:

inputs.conf ON UF

index = my_json_index
sourcetype = my_custom_sourcetype

props.conf ON IDX

disabled = false
KV_MODE = none
LINE_BREAKER = (,)\{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_PREFIX = \{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
SEDCMD-1_del_header = s/.*\"events\":\[//g
SEDCMD-2_clean_eof = s/\(.*\)\]\}/\1/g
0 Karma



If you want to use INDEXED_EXTRACTIONS = JSON you need to use it in the props on the UF. You do not need any other line breaking settings (in fact i think they will be ignored). But the file you want to read needs to be in the correct json syntax! As far as i remember it is a array of json objects.

If you want to do the line breaking by hand, you need to do it on the indexers as usual.

If you set INDEXED_EXTRACTIONS = JSON on the UF, do not set KV_MODE=JSON on the SH. This will extract fields at index time AND at search time, which will give you fields with duplicated values.




Hi @tom_frotscher ! I think I understand it better now. If I use INDEXED_EXTRACTIONS on my UF, then that will override any props on my indexer. The problem is that my file is JSON format, but it has a non-standard header and footer that I will need to delete via SEDCMD before it's JSON "proper". The UF can't use transforms to clean that up.

Based on what you said, if I were to use INDEXED_EXTRACTIONS on my UF, it may not work because my data isn't JSON-proper (yet).

I believe the solution will be then to just do everything on the indexer (no INDEXED_EXTRACTIONS since I have my own line_breaker), then use KV_MODE=JSON on the SH. Does that solution make sense or am I off base on this?

0 Karma


For what you're doing here.... I don't know that I would use INDEXED_EXTRACTIONS, but instead use KV_MODE=json on the search head, and have the line breaker settings on the indexers, but I want to put together a sample of JSON logs wrapped in an array, wrapped in an object to try out and play with before giving an answer. My fear is that INDEXED_EXTRACTIONS uses its own linebreaker, and that may work against you.... but I honestly don't know.

First some of my favorite references about props settings:

0 Karma
Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...