Getting Data In

splunk didn't ingested all json-objects

a101755
Explorer

I have a json-File with with 23.904 objects in it. They are all like:

{
  "1.Entry": "1.Data",
  ...
  "44.Entry": "44.Data"
},


... 23.902 similiar entries...

{
  "1.Entry": "1.Data",
  ...
  "44.Entry": "44.Data"
}

But forwarding the json-file leaded to the count of 22.256 events (presents 22.256 json-objects)

My props.conf

[json_test]
DATETIME_CONFIG =
TIMESTAMP_FIELDS = test.sys_created_on
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Structured
description = test json
disabled = false
pulldown_type = true

 

The problem so is not that a single event is truncated, but the json-file is.

 

Labels (2)
0 Karma

a101755
Explorer

Hello again,

my last entry

"i've parsed my InputFile (json-parser) and before one of the missing event there is an error, like unexpected non-white-space sign.

So i think, it is not a problem of splunk!

" was a wrong result. I've made a mistake in my investigation.

So i tried the programm jq (ubuntu-linux) to validate the whole json-file.

Surprise - there is no failure in the json-file. I've checked the json-file in the forwarder-directory.

So i guess there is a sign in the data,  that splunk "misunderstand" and break the json-structure.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There is something not right about this. If your events are indeed formed this way (a multiline entries) and your LINE_BREAKER is set to ([\r\n]+) there is no way they are ingested as a whole.

Tell us more about how you are ingesting it (and if you're reading a file with a forwarder, show us the relevant inputs.conf stanza and props.conf stanza from the forwarder).

a101755
Explorer

Thank you for your questions @PickleRick .

I'm using forwarding mechanismen.

Here are the stanzas form the forwarder:

inputs.conf

[monitor:///daten/datasources/data/mg_test/entry2group/*.json]
disabled = false
index = mg_test
sourcetype = json_test
crcSalt = <SOURCE>
whitelist = .*\d{8}_Q\d_entry_entry2group\.v\d\.(\d\d\.){2}json$

[json_test]
DATETIME_CONFIG =
TIMESTAMP_FIELDS = test.sys_created_on
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Structured
description = test json
disabled = false
pulldown_type = true

I've copied this props.conf from my first try to upload (over splunk-web).

Here is the stanza from ../etc/system/local/props.conf

[test_json]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
TIMESTAMP_FIELDS = test.sys_created_on
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/
disabled = false
pulldown_type = true

Another investigation shows me, you are on the right way!

I found following event on _internal.

08-25-2024
 19:31:28.338 +0200 ERROR JsonLineBreaker [1737739 structuredparsing] - 
JSON StreamId:1586716756715697390 had parsing error:Unexpected character
 while looking for value: ',' - 
data_source="daten/datasources/data/mg_test/entry2group/20240825_Q2_entry_entry2group.v0.03.01
.json]", data_host="socmg_local_fw", data_sourcetype="json_test"

 

So in the next step i will isolate one event (object) which is lost if there are special sign in the data.

0 Karma

a101755
Explorer

i've parsed my InputFile (json-parser) and before one of the missing event there is an error, like unexpected non-white-space sign.

So i think, it is not a problem of splunk!

 

0 Karma

a101755
Explorer

Further investigation:

I shortened the json-objects from 44 to 43 lines.

{
  "1.Entry": "1.Data",
  ...
  "43.Entry": "43.Data"
},


... 48.186 similiar entries...

{
  "1.Entry": "1.Data",
  ...
  "43.Entry": "43.Data"
}

 But forwarding the json-file leaded to the count of 45.352 events (presents 45.352 json-objects), instead of 48.188 objects.

That's a little bit 'loco' i think.

0 Karma

manjunathmeti
Champion

hi @a101755,

Try adding below configs in input monitors in inputs.conf.

crcSalt = <SOURCE>
initCrcLength = 2048
0 Karma

a101755
Explorer

Thank you @manjunathmeti .

But it doesn't function. The result is the same as before.

I think your advice helps if splunk doesn't import a whole file, if it is not salted and/or the first characters in it doesn't have a difference to another file imported before.

Further Investigation:

I have exported the items from splunk (csv) and compare the original file with the export.

I can't see any muster,  which object is imported and which not.  A muster could be like the first 22.256 objects were importet,

I see, that object 66 to is not imported, 104, 108, 113, and so on not imported.

I think there is a limit to import json-objects. But which is it?

0 Karma
Get Updates on the Splunk Community!

Index This | What’s a riddle wrapped in an enigma?

September 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

BORE at .conf25

Boss Of Regular Expression (BORE) was an interactive session run again this year at .conf25 by the brilliant ...

OpenTelemetry for Legacy Apps? Yes, You Can!

This article is a follow-up to my previous article posted on the OpenTelemetry Blog, "Your Critical Legacy App ...