Solved: Re: ingesting a large list of JSONs as separate ev...

mitag · ‎09-30-2020

Have a list of JSONs that needs to be ingested as separate events (a separate event for each "id"):

[
{"id":"1","fileName":"267663776.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:56:32 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbfd0a7ba724bdbd000a006500810058","progress":"100|00000174cbfd0a7ba724bdbd000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbfd0a7ba724bdbd000a006500810058","startTime":"Sep 26, 2020 12:56:33 PM","completionTime":"Sep 26, 2020 1:45:20 PM","checker":"bcc@9000"},
{"id":"2","fileName":"267664759.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:36:51 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbeb047f5ab7565f000a006500810058","progress":"100|00000174cbeb047f5ab7565f000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbeb047f5ab7565f000a006500810058","startTime":"Sep 26, 2020 12:36:52 PM","completionTime":"Sep 26, 2020 1:16:00 PM","checker":"bcc@9000"},
{"id":"3","fileName":"267660544.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 11:52:22 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbc24d2c370e7c19000a006500810058","progress":"100|00000174cbc24d2c370e7c19000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbc24d2c370e7c19000a006500810058","startTime":"Sep 26, 2020 11:52:23 AM","completionTime":"Sep 26, 2020 12:16:40 PM","checker":"bcc@9000"},
{"id":"4","fileName":"267703040.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:58:49 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb9144a36b0312c5000a006500810058","progress":"100|00000174cb9144a36b0312c5000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb9144a36b0312c5000a006500810058","startTime":"Sep 26, 2020 10:58:52 AM","completionTime":"Sep 26, 2020 11:52:08 AM","checker":"bcc@9000"},

...

{"id":"4999","fileName":"267686238-73abc3c1-359e-4468-8355-d4e8da927661.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:12:06 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb668100c2e5c765000a006500810058","progress":"100|00000174cb668100c2e5c765000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb668100c2e5c765000a006500810058","startTime":"Sep 26, 2020 10:12:08 AM","completionTime":"Sep 26, 2020 10:37:55 AM","checker":"bcc@9000"}
]

The list may contain thousands of entries (events); each JSON could be spread over multiple lines and be nested - i.e. the above example isn't the only type of such list of JSONs we have to ingest.

What is the best practice to ingest this?

P.S. A more general question is, how does one ingest the following file format, with field extractions?

[
{"optional_timestamp": "2020-09-26 15:16", "field1": "value1"},
{"optional_timestamp": "2020-09-26 15:17", "field1": "value2"}
]

...assuming the file may contain thousands of events?

Thanks!

P.P.S. Fairly certain I've seen an answered question about this - but now I can't find it... Apologies for the duplicate...

mitag · ‎10-07-2020

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

View solution in original post

mitag · ‎10-07-2020

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

vegerlandecs · ‎09-30-2020

@mitag

Because your file start with square brackets Splunk is probably thinking it's a single event. If you force it to understand that the line-breaking includes a newline character followed by '{' it will ignore the '[]' as the first level.

The way to do this is to create a sourcetype with the right configs - field extraction done in search-time so you need this config in both forwarders, indexers and search heads.

[your_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\n\r]+)\{
TRUNCATE = 10000
TIME_PREFIX = \"optional_timestamp\"\s*:\s*\"
TIME_FORMAT = %Y-%m-%d %H:%M
MAX_TIMESTAMP_LOOKAHEAD = 16
KV_MODE=json

Note you will need to adjust the options as per the different logs types/formats.

See the line-breaking config test here:
https://regex101.com/r/66ufQK/1

mitag · ‎09-30-2020

I included the line breaks around square brackets for readability - but it's actually not a given. The entire file may contain no line breaks - yet contain boatloads of events - or have boatloads of line breaks and events.

The only known part is this format: [{JSON1},{JSON2},...{JSON31459}] with unknown amount of white spacing within and between JSONs, including around square brackets.

ingesting a large list of JSONs as separate events?

JSON

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?