Splunk Search

How do you go about formatting a nested JSON that was extracted using the spath command?

wsanderstii
Path Finder

There all kinds of questions (and not too many answers) about processing nested JSON, either at the source or in search. I have some nested JSON that the spath command can extract the fields from, but the display in the Search & Reporting app is still only one JSON level deep. For example:

{   [-] 
     log:    {"message":"looks like we got no XML document","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}

     stream:     stdout 
     time:   2018-10-17T20:49:01.841051338Z 
}

The spath command successfully extracts the fields in the "log" element, but I'd like to actually see the "log" properly formatted:

{
    "channel": "lumen",
    "context": {
        "account_id": 1234,
        "method": "GET",
        "path": "somepath",
        "status": 400,
 ...etc
      "message": "looks like we got no XML document"
 }

Anyway to do this in a search?

Tags (2)
0 Karma

harsmarvania57
Ultra Champion

Try below configurations on Indexer or Heavy Forwarder whichever comes first from Universal Forwarder and remove INDEXED_EXTRACTIONS = json on Universal Forwarder

props.conf

[yourSourcetype]
SHOULD_LINEMERGE=true
NO_BINARY_CHECK=true
SEDCMD-removeslash=s/(?:\\"|\\\\")/"/g
SEDCMD-removenewline=s/\\\\n//g
TIME_PREFIX="time":\s"
MAX_TIMESTAMP_LOOKAHEAD=30

In above configuration I was not able to parse \\n to new line so I have removed that using SEDCMD so you will see long string in log.message field without new lines which might look ugly otherwise splunk is extracting all required field which you require based on below sample data.

{ "log":     {\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE\","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}, "stream": "stdout", "time": "2018-10-17T20:49:01.841051338Z" }
0 Karma

wsanderstii
Path Finder

Great answer. I won't be able to test this for a while but I am going to reference it in my future configs.

0 Karma

harsmarvania57
Ultra Champion

I have converted my comment to answer if it will work for you then you can accept it as answer.

0 Karma

wsanderstii
Path Finder

We do have that set AFAIK. However, a closer look at the raw entry:

{"log":"{\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE...

shows "log" is actually a string and not a JSON. You have to feed it into a formatter without the surrounding quotes.

That being said, appending "spath input=log" to the query will extract all the fields in the string "log", it just won't pretty print the results.

I don't think " INDEXED_EXTRACTIONS = json" can account for this without some customization.

0 Karma

harsmarvania57
Ultra Champion

Hi @wsanderstii,

If you are ingesting this data into Splunk using Splunk Universal Forwarder then can you please try below configuration on your Universal Forwarder?

props.conf

[yourSourcetype]
INDEXED_EXTRACTIONS = json

And then restart splunk service on splunk universal forwarder.

0 Karma
Get Updates on the Splunk Community!

Meet Duke Cyberwalker | A hero’s journey with Splunk

We like to say, the lightsaber is to Luke as Splunk is to Duke. Curious yet? Then read Eric Fusilero’s latest ...

The Future of Splunk Search is Here - See What’s New!

We’re excited to introduce two powerful new search features, now generally available for Splunk Cloud Platform ...

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...