Getting Data In

Json datas selection and meta removal

pck_npluyaud
Engager

Hye.

The situation :  an instance of Splunk standalone (test platform), and an UF.
The data : JSON Stream with multi level.
The problem : the volume of data being important, we would like to reduce the _raw at only one field. But all JSON fields are saved as _meta.

We have succeeded to update source, sourcetype and host from the JSON datas.

But impossible to omit _meta ... (they always appear in the Search Head)

IN : 

{
"input":{
     "type":"log"},
"log":{
     "file":"c:\log.josn"},
"@metadata":{
     "beat":"filebeat",
     "version":"7.10.2"},
"message":"bla bla bla",
"fields":{
     "type":"bdc",
     "host":"VLCR03",
     "type2":"back"}
}

OUT : 

_raw  : "bla bla bla" <= OK
meta "input.***" <= to suppress
meta "log.***" <= to suppress
meta "@metadata.beat" <= to keep
meta "@metadata.version"<= to suppress
meta "message"<= to suppress
meta "fields.***" <= to suppress

props.conf on the UF

SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
CHARSET = AUTO
KV_MODE = none
AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-x = set_host set_source set_sourcetype
TRANSFORMS-y = extract_message
TRANSFORMS-z = remove_metadata

transforms.conf on the UF

[extract_message]
SOURCE_KEY = field:message
REGEX = (.*)
FORMAT = $1
DEST_KEY = _raw

[set_host]
SOURCE_KEY = field:fields.host
REGEX = (.*)
FORMAT = host::$1
DEST_KEY = MetaData:Host

[set_source]
SOURCE_KEY = field:log.file
REGEX = (.*)
FORMAT = source::$1
DEST_KEY = MetaData:Source

[set_sourcetype]
SOURCE_KEY = fields:fields.type,fields.type2
REGEX = (.*)\s(.*)
FORMAT = sourcetype::$1:$2
DEST_KEY = MetaData:Sourcetype

[remove_message]
SOURCE_KEY = _meta:message
REGEX = (.*)
DEST_KEY = queue
FORMAT = nullQueue

Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

These props.conf settings MUST go on the first full Splunk Enterprise instance (HF or indexer) that sees the data.  The UF will ignore all of them.  Perhaps this is why the original props.conf settings didn't work.

To make the comma optional, insert a ? after the comma in the SEDCMD.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pck_npluyaud
Engager

I put the rule i the props.conf in the UF. But same result , the _meta "input.type" is not removed.

Another point, the sedcmd should be general : the json schema can never be the same (comma non at the same place)

SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
CHARSET = AUTO
KV_MODE = none
AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-x = remove_events set_host set_source set_sourcetype
TRANSFORMS-y = extract_message
SEDCMD-noinput = s/"input":\{[\s\S]+?},//
a.png

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The FORMAT = nullQueue setting is for entire events, not individual pieces of text.  To remove text, I recommend using SEDCMD.  I also recommend retaining keywords to make parsing easier at search time.

Try these settings in props.conf:

SEDCMD-noinput = s/"input":\{[\s\S]+?},//
SEDCMD-nolog = s/"log":\{[\s\S]+?},//
SEDCMD-nofields = s/"fields":\{[\s\S]+?},?//
SEDCMD-noversion = s/,\s+"version":"[\s\S]+?"//
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...