Json ingest with weird characters or binary

quahfamili · ‎11-18-2018

Hi all,

I was trying to ingest some json files however the json seems to have some weird characters or binary and parsing failed.

Example of JSON:

{
"abc": "weird_characters"
}

I got this error : ERROR JsonLineBreaker - JSON Stream ID: xxxxxxxxxxxxxxxxxxxxxx had parsing error: Unexpected character while parsing backslash escape: 'x'

I had experimented on a lot of prof.conf including setting binary to false. I suspect this is something to do with encoding.

How do i solved this?

Thanks in advance

quahfamili · ‎03-26-2019

Hi all,

I checked the weird_characters are chinese character. I had set the encoding at UTF-8. I even try to modify my data to "abc": "\weird_characters". However, to no avail. I still cannot parse the data.

Need help

nickhills · ‎03-26-2019

If its Chinese, have you tried with UTF-16?

If my comment helps, please give it a thumbs up!

acharlieh · ‎11-18-2018

Does the JSON string (Assuming you have the correct CHARSET in props.conf) actually contain \x? If so, you may have invalid JSON... check out the grammar on https://json.org The only characters that can follow a backslash in a string are slash, backslash, double quote, b, f, n, r, t, OR u (when immediately followed by 4 hex digits).

quahfamili · ‎11-18-2018

Hi,

I manually removed the weird_characters and the JSON file can be ingested. However, these character are housed in the double quotes.

@acharlieh The file does not actually contain \x. However, I thought due to the encoding of these weird_characters, splunk might had recognized it as \x. I had set CHARSET to UTF-8 and the files continue to get the same error.

Anyone can help?

acharlieh · ‎11-19-2018

Where did you set the CHARSET? Just to double check this is on the Forwarder or other node performing ingestion yes? (Being an ingestion time thing). And you restarted the forwarder before trying ingesting one of these files again?

Is the source system actually producing the whole file as UTF-8 encoded JSON? How do you know?

Have you looked at your input in a good hex editor? If you're on Mac, I like HexFiend but there are many other good ones out there. The goal of this exercise is to know the actual bytes that are being ingested, and try to determine for certain what encoding is actually in place. A good editor will let you try out interpreting the bytes as a few different encodings, and see what is there when you do so. Using the output of this, and possibly a site like https://fileformat.info/info/unicode/ you can actually figure out what these "weird" characters actually are and reason about them.

ddrillic · ‎11-18-2018

It would be most useful to show us these weird_characters.

quahfamili · ‎11-18-2018

@ddrillic I cannot paste it over. It looks like characters that are forced UTF or something

@Kosanam How do I check the CHARSET?

FrankVl · ‎11-19-2018

Then take a screenshot and upload that somewhere to share it with us. Without understand what "weird characters" you're seeing it is a bit shooting in the dark.

Kosanam · ‎11-18-2018

you can edit in props.conf or when you add the sample file to set the sourcetype check the advanced settings

quahfamili · ‎11-18-2018

It is not set, I thought it is automatically set to UTF-8 if it is not defined. From the document it is documented as ACSII As default.

Kosanam · ‎11-18-2018

check the CHARSET value may be adjust it to UCS-2LE

Json ingest with weird characters or binary

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

AI for AppInspect

Join the Conversation

Json ingest with weird characters or binary

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

AI for AppInspect