Getting Data In

Json ingest with weird characters or binary

quahfamili
Path Finder

Hi all,

I was trying to ingest some json files however the json seems to have some weird characters or binary and parsing failed.

Example of JSON:

{
"abc": "weird_characters"
}

I got this error : ERROR JsonLineBreaker - JSON Stream ID: xxxxxxxxxxxxxxxxxxxxxx had parsing error: Unexpected character while parsing backslash escape: 'x'

I had experimented on a lot of prof.conf including setting binary to false. I suspect this is something to do with encoding.

How do i solved this?

Thanks in advance

Tags (2)
0 Karma

quahfamili
Path Finder

Hi all,

I checked the weird_characters are chinese character. I had set the encoding at UTF-8. I even try to modify my data to "abc": "\weird_characters". However, to no avail. I still cannot parse the data.

Need help

0 Karma

nickhills
Ultra Champion

If its Chinese, have you tried with UTF-16?

If my comment helps, please give it a thumbs up!
0 Karma

acharlieh
Influencer

Does the JSON string (Assuming you have the correct CHARSET in props.conf) actually contain \x? If so, you may have invalid JSON... check out the grammar on https://json.org The only characters that can follow a backslash in a string are slash, backslash, double quote, b, f, n, r, t, OR u (when immediately followed by 4 hex digits).

0 Karma

quahfamili
Path Finder

Hi,

I manually removed the weird_characters and the JSON file can be ingested. However, these character are housed in the double quotes.

@acharlieh The file does not actually contain \x. However, I thought due to the encoding of these weird_characters, splunk might had recognized it as \x. I had set CHARSET to UTF-8 and the files continue to get the same error.

Anyone can help?

0 Karma

acharlieh
Influencer

Where did you set the CHARSET? Just to double check this is on the Forwarder or other node performing ingestion yes? (Being an ingestion time thing). And you restarted the forwarder before trying ingesting one of these files again?

Is the source system actually producing the whole file as UTF-8 encoded JSON? How do you know?

Have you looked at your input in a good hex editor? If you're on Mac, I like HexFiend but there are many other good ones out there. The goal of this exercise is to know the actual bytes that are being ingested, and try to determine for certain what encoding is actually in place. A good editor will let you try out interpreting the bytes as a few different encodings, and see what is there when you do so. Using the output of this, and possibly a site like https://fileformat.info/info/unicode/ you can actually figure out what these "weird" characters actually are and reason about them.

0 Karma

ddrillic
Ultra Champion

It would be most useful to show us these weird_characters.

0 Karma

quahfamili
Path Finder

@ddrillic I cannot paste it over. It looks like characters that are forced UTF or something

@Kosanam How do I check the CHARSET?

0 Karma

FrankVl
Ultra Champion

Then take a screenshot and upload that somewhere to share it with us. Without understand what "weird characters" you're seeing it is a bit shooting in the dark.

0 Karma

Kosanam
New Member

you can edit in props.conf or when you add the sample file to set the sourcetype check the advanced settings

0 Karma

quahfamili
Path Finder

It is not set, I thought it is automatically set to UTF-8 if it is not defined. From the document it is documented as ACSII As default.

0 Karma

Kosanam
New Member

check the CHARSET value may be adjust it to UCS-2LE

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...