Splunk Search

How do you go about formatting a nested JSON that was extracted using the spath command?

Path Finder

There all kinds of questions (and not too many answers) about processing nested JSON, either at the source or in search. I have some nested JSON that the spath command can extract the fields from, but the display in the Search & Reporting app is still only one JSON level deep. For example:

{   [-] 
     log:    {"message":"looks like we got no XML document","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}

     stream:     stdout 
     time:   2018-10-17T20:49:01.841051338Z 
}

The spath command successfully extracts the fields in the "log" element, but I'd like to actually see the "log" properly formatted:

{
    "channel": "lumen",
    "context": {
        "account_id": 1234,
        "method": "GET",
        "path": "somepath",
        "status": 400,
 ...etc
      "message": "looks like we got no XML document"
 }

Anyway to do this in a search?

Tags (2)
0 Karma

SplunkTrust
SplunkTrust

Try below configurations on Indexer or Heavy Forwarder whichever comes first from Universal Forwarder and remove INDEXED_EXTRACTIONS = json on Universal Forwarder

props.conf

[yourSourcetype]
SHOULD_LINEMERGE=true
NO_BINARY_CHECK=true
SEDCMD-removeslash=s/(?:\\"|\\\\")/"/g
SEDCMD-removenewline=s/\\\\n//g
TIME_PREFIX="time":\s"
MAX_TIMESTAMP_LOOKAHEAD=30

In above configuration I was not able to parse \\n to new line so I have removed that using SEDCMD so you will see long string in log.message field without new lines which might look ugly otherwise splunk is extracting all required field which you require based on below sample data.

{ "log":     {\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE\","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}, "stream": "stdout", "time": "2018-10-17T20:49:01.841051338Z" }
0 Karma

Path Finder

Great answer. I won't be able to test this for a while but I am going to reference it in my future configs.

0 Karma

SplunkTrust
SplunkTrust

I have converted my comment to answer if it will work for you then you can accept it as answer.

0 Karma

Path Finder

We do have that set AFAIK. However, a closer look at the raw entry:

{"log":"{\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE...

shows "log" is actually a string and not a JSON. You have to feed it into a formatter without the surrounding quotes.

That being said, appending "spath input=log" to the query will extract all the fields in the string "log", it just won't pretty print the results.

I don't think " INDEXED_EXTRACTIONS = json" can account for this without some customization.

0 Karma

SplunkTrust
SplunkTrust

Hi @wsanderstii,

If you are ingesting this data into Splunk using Splunk Universal Forwarder then can you please try below configuration on your Universal Forwarder?

props.conf

[yourSourcetype]
INDEXED_EXTRACTIONS = json

And then restart splunk service on splunk universal forwarder.

0 Karma