Getting Data In

splunk didn't ingested all json-objects

a101755
Explorer

I have a json-File with with 23.904 objects in it. They are all like:

{
  "1.Entry": "1.Data",
  ...
  "44.Entry": "44.Data"
},


... 23.902 similiar entries...

{
  "1.Entry": "1.Data",
  ...
  "44.Entry": "44.Data"
}

But forwarding the json-file leaded to the count of 22.256 events (presents 22.256 json-objects)

My props.conf

[json_test]
DATETIME_CONFIG =
TIMESTAMP_FIELDS = test.sys_created_on
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Structured
description = test json
disabled = false
pulldown_type = true

 

The problem so is not that a single event is truncated, but the json-file is.

 

Labels (2)
0 Karma

a101755
Explorer

Hello again,

my last entry

"i've parsed my InputFile (json-parser) and before one of the missing event there is an error, like unexpected non-white-space sign.

So i think, it is not a problem of splunk!

" was a wrong result. I've made a mistake in my investigation.

So i tried the programm jq (ubuntu-linux) to validate the whole json-file.

Surprise - there is no failure in the json-file. I've checked the json-file in the forwarder-directory.

So i guess there is a sign in the data,  that splunk "misunderstand" and break the json-structure.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There is something not right about this. If your events are indeed formed this way (a multiline entries) and your LINE_BREAKER is set to ([\r\n]+) there is no way they are ingested as a whole.

Tell us more about how you are ingesting it (and if you're reading a file with a forwarder, show us the relevant inputs.conf stanza and props.conf stanza from the forwarder).

a101755
Explorer

Thank you for your questions @PickleRick .

I'm using forwarding mechanismen.

Here are the stanzas form the forwarder:

inputs.conf

[monitor:///daten/datasources/data/mg_test/entry2group/*.json]
disabled = false
index = mg_test
sourcetype = json_test
crcSalt = <SOURCE>
whitelist = .*\d{8}_Q\d_entry_entry2group\.v\d\.(\d\d\.){2}json$

[json_test]
DATETIME_CONFIG =
TIMESTAMP_FIELDS = test.sys_created_on
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Structured
description = test json
disabled = false
pulldown_type = true

I've copied this props.conf from my first try to upload (over splunk-web).

Here is the stanza from ../etc/system/local/props.conf

[test_json]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
TIMESTAMP_FIELDS = test.sys_created_on
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/
disabled = false
pulldown_type = true

Another investigation shows me, you are on the right way!

I found following event on _internal.

08-25-2024
 19:31:28.338 +0200 ERROR JsonLineBreaker [1737739 structuredparsing] - 
JSON StreamId:1586716756715697390 had parsing error:Unexpected character
 while looking for value: ',' - 
data_source="daten/datasources/data/mg_test/entry2group/20240825_Q2_entry_entry2group.v0.03.01
.json]", data_host="socmg_local_fw", data_sourcetype="json_test"

 

So in the next step i will isolate one event (object) which is lost if there are special sign in the data.

0 Karma

a101755
Explorer

i've parsed my InputFile (json-parser) and before one of the missing event there is an error, like unexpected non-white-space sign.

So i think, it is not a problem of splunk!

 

0 Karma

a101755
Explorer

Further investigation:

I shortened the json-objects from 44 to 43 lines.

{
  "1.Entry": "1.Data",
  ...
  "43.Entry": "43.Data"
},


... 48.186 similiar entries...

{
  "1.Entry": "1.Data",
  ...
  "43.Entry": "43.Data"
}

 But forwarding the json-file leaded to the count of 45.352 events (presents 45.352 json-objects), instead of 48.188 objects.

That's a little bit 'loco' i think.

0 Karma

manjunathmeti
Champion

hi @a101755,

Try adding below configs in input monitors in inputs.conf.

crcSalt = <SOURCE>
initCrcLength = 2048
0 Karma

a101755
Explorer

Thank you @manjunathmeti .

But it doesn't function. The result is the same as before.

I think your advice helps if splunk doesn't import a whole file, if it is not salted and/or the first characters in it doesn't have a difference to another file imported before.

Further Investigation:

I have exported the items from splunk (csv) and compare the original file with the export.

I can't see any muster,  which object is imported and which not.  A muster could be like the first 22.256 objects were importet,

I see, that object 66 to is not imported, 104, 108, 113, and so on not imported.

I think there is a limit to import json-objects. But which is it?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...