Solved: Find event with invalid JSON

tjago11 · ‎06-27-2018

Trying to find a consistent way of finding events that contain invalid JSON. We've ran into all sorts of different issues with escape characters and charset mismatching, so looking for a way to identify any/all events with bad JSON.

There must be something in Splunk that does this because when I have an invalid event, Splunk can't do the fancy pants syntax highlighting. Hoping that there is some way to tap into whatever logic they are using.

Found a regex solution that is way over my head that works in regex101, but can't get it working in SPL:
https://regex101.com/r/No8Xnc/1/

Fixed the easy stuff like escaping quotes and removing comments and got this query running:

| makeresults
| eval basicJson="{\"foo\":\"bar\"}"
| rex field=basicJson "(?x-i)
(?(DEFINE)
(?<ws>[\r\n\t\x20]*)
(?<str>\"(?:\\[rntbf\\\/] | [[:xdigit:]]{4} | [^\\\"[:cntrl:]])*\")
(?<bool>true|false)
(?<nil>nil)
(?<num>-?\d+(?:\.\d+)?)
(?<elem>(?:(?&str)|(?&bool)|(?&nil)|(?&num))(?&ws))
(?<comma>,(?&ws))
)
\[ (?&ws)
(?:
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)(?:(?&elem)|[\[\{]))(?&comma))
)*
\]

|
\{ (?&ws)
(?:
    (?&str) (?&ws) 
    : (?&ws)
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)[\"\[\{])(?&comma))
)*
\}"

No more failures but comes back with a bunch of empty columns :sigh:

Any help is appreciated. Thanks.

cpetterborg · ‎06-27-2018

Often if there is data in a JSON that is invalid, then the field extraction doesn't work, and you end up with a field that you thought should be there that isn't. If you have a consistent field that should be there, you can search for the lack of that field. Something like:

... | search NOT fieldname=*

Would that work for you?

View solution in original post

tjago11 · ‎06-29-2018

Looks like I was using a non-compatible stanza, the version of my sandbox server is 6.6.4 so uses an older stanza:
[rex]
recursion_limit = 999999
match_limit = 999999

Was able to get a bit more processing but now getting generic failure:

Job terminated unexpectedly

The search.log doesn't have an error in it, couple unrelated warnings about lookup tables. Checked the _internal and _audit logs for the SID and the error text, nothing there either.

Looks like I'm butting up against a framework limit on the system, which is disappointing when regex101 shows that it should work just fine. :sadpanda:

tjago11 · ‎06-29-2018

Found the edge of the limit by slimming down to just the field with a bunch of escaped quotes in it. This will fail:

| makeresults
| eval quotedJSON = "{  \"property25\": \"<?xml version=\\\"1.0\\\" encoding=\\\"utf-16\\\"?><Error ServerName=\\\"xxxxxx\\\" DateTime=\\\"6/28/2018 1:49:33 PM\\\"><Message Description=\\\"Object reference not set to an instance of an object.\\\" Number=\\\"\\\" /><Methods><Method Dll=\\\"PPUI.Widgets, Version=0.18.1.1652, Culture=neutral, PublicKeyToken=__SECURITYMASK__\\\" Class=\\\"classname.xPSLayoutWidget\\\" Name=\\\"ProcessInstruction_DisplayFieldLabel\\\" LineNumber=\\\"42\\\"><Details><Detail Name=\\\"DisplayFieldLabel\\\" Description=\\\"Object reference not set to an instance of an object.\\\" /><Detail Name=\\\"Inner XML\\\" Description=\\\"\\\" /><Detail Name=\\\"fieldName\\\" Description=\\\"\\\" /><Detail Name=\\\"labelType\\\" Description=\\\"\\\" /><Detail Name=\\\"suppressColon\\\" Description=\\\"\\\" /><Detail Name=\\\"lengthAttr\\\" Description=\\\"\\\" /><Detail Name=\\\"mainLabel\\\" Description=\\\"Is Unknown\\\" /><Detail Name=\\\"subLabel\\\" Description=\\\"Is Unknown\\\" /></Details></Method><Method Dll=\\\"PPUI.Widgets, Version=0.18.1.1652, Culture=neutral, PublicKeyToken=__SECURITYMASK__\\\" Class=\\\"classname.Widgets.xPSLayoutWidget\\\" \"
}"
| rex field=quotedJSON
"(?x-i)
(?(DEFINE)
(?<ws>[\r\n\t\x20]*)
(?<str>\"(?:\\[rntbf\\\/] | \\\\\" | [[:xdigit:]]{4} | [^\\\"[:cntrl:]])*\")
(?<bool>true|false)
(?<nil>nil)
(?<num>-?\d+(?:\.\d+)?)
(?<elem>(?:(?&str)|(?&bool)|(?&nil)|(?&num))(?&ws))
(?<comma>,(?&ws))
)
(?<extractedJSON>
\[ (?&ws)
(?:
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)(?:(?&elem)|[\[\{]))(?&comma))
)*
\]
|
\{ (?&ws)
(?:
    (?&str) (?&ws) 
    : (?&ws)
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)[\"\[\{])(?&comma))
)*
\}
)"
| eval rawSameAsExtracted = if(_raw=extractedJSON, "true", "false")
| table quotedJSON, extractedJSON, rawSameAsExtracted

But if you pull off the last quoted element it will work:

Class=\\\"classname.Widgets.xPSLayoutWidget\\\"

Honestly I don't think this should fail. When putting the same object in a regex 101 test it completes in just 4ms, I'm thinking the limits.conf settings aren't working...

tjago11 · ‎06-28-2018

Okay, got an updated rex that is working well. Had an issue with matching on \" but that has been solved.
| makeresults
| eval quotedJSON = "{\"foo\":\"bar\", \"works\":\"say, maybe \r \b \\" \"}"
| rex field=quotedJSON
"(?x-i)
(?(DEFINE)
(?[\r\n\t\x20])
(?\"(?:\[rntbf\\/] | \\\" | [[:xdigit:]]{4} | [^\\"[:cntrl:]])\")
(?true|false)
(?nil)
(?-?\d+(?:.\d+)?)
(?(?:(?&str)|(?&bool)|(?&nil)|(?&num))(?&ws))
(?,(?&ws))
)
(?
[ (?&ws)
(?:
(?:
(?&elem) | (?R)(?&ws)
)
(?(?=(?&comma)(?:(?&elem)|[[{]))(?&comma))
)*
]
|
{ (?&ws)
(?:
(?&str) (?&ws)
: (?&ws)
(?:
(?&elem) | (?R)(?&ws)
)
(?(?=(?&comma)[\"[{])(?&comma))
)*
}
)"
| eval rawSameAsExtracted = if(quotedJSON=extractedJSON, "true", "false")
| table quotedJSON, extractedJSON, rawSameAsExtracted

Still puking on the PCRE limits, trying to see if I have access to a sandbox to fiddle with those settings in limits.conf

tjago11 · ‎06-29-2018

Got access to a sandbox server and added this stanza in the limits.conf:
[rex]
depth_limit = 9999999
match_limit = 9999999

Couldn't find anything in the documentation that specified the real limits. Tried zero (0) and that didn't work at all. Also tried 100000 and 999999. Error is slightly different on the sandbox server than what is running in cloud:

has exceeded the configured recursion_limit, consider raising the value in limits.conf

I also tried adding a "recursion_limit" line in the limits.conf, that gave a generic error:

search failed unexpectedly

The search.log file didn't have anything helpful in it...coming up empty.

cpetterborg · ‎06-27-2018

Often if there is data in a JSON that is invalid, then the field extraction doesn't work, and you end up with a field that you thought should be there that isn't. If you have a consistent field that should be there, you can search for the lack of that field. Something like:

... | search NOT fieldname=*

Would that work for you?

tjago11 · ‎06-27-2018

True, but unfortunately I don't have this:

consistent field that should be there

I did figure out why the extraction didn't work, was missing a top level capture group, added this right after the DEFINE group:

(?<extractedJSON>

Now I am getting the correct JSON and can do a simple comparison at the end:

| eval rawSameAsExtracted = if(_raw=extractedJSON, "true", "false")

Working wonderfully...except it isn't. Now I'm getting an error for very large JSON objects:

exceeded the PCRE recursion limit

Found another answer with a similar issue: https://answers.splunk.com/answers/581183/is-my-rex-right-rex-has-exceeded-configured-match.html

But that one was using lookbacks, this monstrosity...I actually don't even understand it. 😉

Ideas??

cpetterborg · ‎06-27-2018

You have something in your regular expression that has too much backtracking to do. What regular expression do you have going on? It could be a rex or regex to something in props.conf or transforms.conf, or other places. If you have one that could be affecting this error, post it here.

tjago11 · ‎06-28-2018

The rex is in the original question and it's a beast. It uses a lot of recursion which is necessary for hierarchical object structures so I'm not sure there is anything I can do there. Likely means I'll have to up the PCRE limit but would love a set of skilled eyes on it. Thanks.

cpetterborg · ‎06-28-2018

I'd need much more than eval basicJson="{\"foo\":\"bar\"}" for data to check. Something real. Anonymized, but real. That error comes from too much backtracking, and with the data above there isn't any easy way to find the problem.

tjago11 · ‎06-29-2018

Okay, the data is too big to paste in this system but I did store it as a regex101 example:
https://regex101.com/r/No8Xnc/2

The real issue is "property25" which in the real world is an application stack trace formatted as xml. This data has a ton of double quotes and other other escape characters and is really long. If I remove that field the regex works just fine.

If you pull the example into a query you'll have to find/replace single quotes with \" and then replace \" with \\". This sample without property25 works just fine:

| makeresults
| eval quotedJSON = "{
  \"InstrumentationLogDateTime\": \"2018-06-28T13:49:33.7895781-04:00\",
  \"property1\": \"00000000-1111-2222-3333-444444444444\",
  \"property2\": \"00000000-1111-2222-3333-444444444444\",
  \"property3\": \"my application\",
  \"property4\": \"00000000-1111-2222-3333-444444444444\",
  \"property5\": \"V0300\",
  \"property6\": \"999918\",
  \"property7\": \"system\",
  \"property8\": \"Info\",
  \"property9\": \"123456\",
  \"property10\": \"xx\",
  \"property11\": \"xx\",
  \"property12\": \"page name\",
  \"property13\": \"class name\",
  \"property14\": \"method name\",
  \"property15\": \"123456\",
  \"property16\": \"123456\",
  \"property17\": 42,
  \"property18\": \"xx\",
  \"property19\": \"xx\",
  \"property20\": \"123456\",
  \"property21\": \"trans name\",
  \"property22\": \"42\",
  \"property23\": \"2018-06-28T13:49:28\",
  \"property24\": \"Object reference not set to an instance of an object.\",
  \"CustomFields\": {
    \"property26\": \"\",
    \"property27\": \"xx\",
    \"property28\": \"anotherbiglongvalue\",
    \"property29\": \"xx\",
    \"property31\": \"\",
    \"property32\": \"123456\",
    \"property33\": \"anotherbiglongvalue.anotherbiglongvalue.anotherbiglongvalue\",
    \"property34\": \"anotherbiglongvalue,anotherbiglongvalue,anotherbiglongvalue\",
    \"property35\": \"anotherbiglongvalueanotherbiglongvalueanotherbiglongvalue\",
    \"property36\": \"this is a big ol thing\",
    \"property37\": \"this is a big ol thing\",
    \"property38\": \"this is a big ol thing\",
    \"property39\": \"2018-06-28T13:49:33.0000000-04:00\",
    \"property40\": \"2018-06-28T13:32:24.0000000-04:00\"
  }
}"
| rex field=quotedJSON
"(?x-i)
(?(DEFINE)
(?<ws>[\r\n\t\x20]*)
(?<str>\"(?:\\[rntbf\\\/] | \\\\\" | [[:xdigit:]]{4} | [^\\\"[:cntrl:]])*\")
(?<bool>true|false)
(?<nil>nil)
(?<num>-?\d+(?:\.\d+)?)
(?<elem>(?:(?&str)|(?&bool)|(?&nil)|(?&num))(?&ws))
(?<comma>,(?&ws))
)
(?<extractedJSON>
\[ (?&ws)
(?:
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)(?:(?&elem)|[\[\{]))(?&comma))
)*
\]
|
\{ (?&ws)
(?:
    (?&str) (?&ws) 
    : (?&ws)
    (?:
        (?&elem) | (?R)(?&ws)
    )
    (?(?=(?&comma)[\"\[\{])(?&comma))
)*
\}
)"
| eval rawSameAsExtracted = if(_raw=extractedJSON, "true", "false")
| table quotedJSON, extractedJSON, rawSameAsExtracted

cpetterborg · ‎06-29-2018

You are going to find this odd that this is all it takes, but WITH YOUR EXAMPLE DATA I was able to keep it from getting the error, but still produce what appears to me to be a good result, with a single character change to the rex:

"(?x-i)
 (?(DEFINE)
 (?<ws>[\r\n\t\x20]*)
 (?<str>\"(?:\\[rntbf\\\/] | \\\\\" | [[:xdigit:]]{4} | [^\\\"[:cntrl:]])*?\")
 (?<bool>true|false)
 (?<nil>nil)
 (?<num>-?\d+(?:\.\d+)?)
 (?<elem>(?:(?&str)|(?&bool)|(?&nil)|(?&num))(?&ws))
 (?<comma>,(?&ws))
 )
 (?<extractedJSON>
 \[ (?&ws)
 (?:
     (?:
         (?&elem) | (?R)(?&ws)
     )
     (?(?=(?&comma)(?:(?&elem)|[\[\{]))(?&comma))
 )*
 \]
 |
 \{ (?&ws)
 (?:
     (?&str) (?&ws) 
     : (?&ws)
     (?:
         (?&elem) | (?R)(?&ws)
     )
     (?(?=(?&comma)[\"\[\{])(?&comma))
 )*
 \}
 )"

Fourth line, added a question mark after the asterisk.

Try this on your whole data set and see if it fixes the problem. If it doesn't, then I would suggest looking at changing depth_limit in limits.conf.

tjago11 · ‎07-16-2018

One thing that I found out the hard way. In my sample I was using the makeresults command to simulate data and test the changes to limits.conf. Rolled the change to production and it is failing...because the rex command is running against the event data on the Indexer, not on the Search Head. :angry:

I'd recommend adding the limits.conf change to both sides to keep them in synch because depending on the query the rex might run on the SH or the Indexer. Thanks.

tjago11 · ‎06-29-2018

YES!!!!

That's a winner, thank you so much. Well done sir, post that as answer so I can accept.

cpetterborg · ‎06-29-2018

I converted this whole comment thread so that the complete context makes up the answer. Glad it works for you.

Find event with invalid JSON

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

Monitoring AI Agents with Splunk Observability Cloud

Join the Conversation