How do you go about formatting a nested JSON that ...

wsanderstii · ‎10-17-2018

There all kinds of questions (and not too many answers) about processing nested JSON, either at the source or in search. I have some nested JSON that the spath command can extract the fields from, but the display in the Search & Reporting app is still only one JSON level deep. For example:

{   [-] 
     log:    {"message":"looks like we got no XML document","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}

     stream:     stdout 
     time:   2018-10-17T20:49:01.841051338Z 
}

The spath command successfully extracts the fields in the "log" element, but I'd like to actually see the "log" properly formatted:

{
    "channel": "lumen",
    "context": {
        "account_id": 1234,
        "method": "GET",
        "path": "somepath",
        "status": 400,
 ...etc
      "message": "looks like we got no XML document"
 }

Anyway to do this in a search?

harsmarvania57 · ‎10-19-2018

Try below configurations on Indexer or Heavy Forwarder whichever comes first from Universal Forwarder and remove INDEXED_EXTRACTIONS = json on Universal Forwarder

props.conf

[yourSourcetype]
SHOULD_LINEMERGE=true
NO_BINARY_CHECK=true
SEDCMD-removeslash=s/(?:\\"|\\\\")/"/g
SEDCMD-removenewline=s/\\\\n//g
TIME_PREFIX="time":\s"
MAX_TIMESTAMP_LOOKAHEAD=30

In above configuration I was not able to parse \\n to new line so I have removed that using SEDCMD so you will see long string in log.message field without new lines which might look ugly otherwise splunk is extracting all required field which you require based on below sample data.

{ "log":     {\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE\","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}, "stream": "stdout", "time": "2018-10-17T20:49:01.841051338Z" }

wsanderstii · ‎10-19-2018

Great answer. I won't be able to test this for a while but I am going to reference it in my future configs.

harsmarvania57 · ‎10-19-2018

I have converted my comment to answer if it will work for you then you can accept it as answer.

wsanderstii · ‎10-18-2018

We do have that set AFAIK. However, a closer look at the raw entry:

{"log":"{\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE...

shows "log" is actually a string and not a JSON. You have to feed it into a formatter without the surrounding quotes.

That being said, appending "spath input=log" to the query will extract all the fields in the string "log", it just won't pretty print the results.

I don't think " INDEXED_EXTRACTIONS = json" can account for this without some customization.

harsmarvania57 · ‎10-18-2018

Hi @wsanderstii,

If you are ingesting this data into Splunk using Splunk Universal Forwarder then can you please try below configuration on your Universal Forwarder?

props.conf

[yourSourcetype]
INDEXED_EXTRACTIONS = json

And then restart splunk service on splunk universal forwarder.

How do you go about formatting a nested JSON that was extracted using the spath command?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation