Getting Data In

ingesting a large list of JSONs as separate events?

mitag
Contributor

Have a list of JSONs that needs to be ingested as separate events (a separate event for each "id"):

[
{"id":"1","fileName":"267663776.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:56:32 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbfd0a7ba724bdbd000a006500810058","progress":"100|00000174cbfd0a7ba724bdbd000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbfd0a7ba724bdbd000a006500810058","startTime":"Sep 26, 2020 12:56:33 PM","completionTime":"Sep 26, 2020 1:45:20 PM","checker":"bcc@9000"},
{"id":"2","fileName":"267664759.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:36:51 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbeb047f5ab7565f000a006500810058","progress":"100|00000174cbeb047f5ab7565f000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbeb047f5ab7565f000a006500810058","startTime":"Sep 26, 2020 12:36:52 PM","completionTime":"Sep 26, 2020 1:16:00 PM","checker":"bcc@9000"},
{"id":"3","fileName":"267660544.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 11:52:22 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbc24d2c370e7c19000a006500810058","progress":"100|00000174cbc24d2c370e7c19000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbc24d2c370e7c19000a006500810058","startTime":"Sep 26, 2020 11:52:23 AM","completionTime":"Sep 26, 2020 12:16:40 PM","checker":"bcc@9000"},
{"id":"4","fileName":"267703040.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:58:49 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb9144a36b0312c5000a006500810058","progress":"100|00000174cb9144a36b0312c5000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb9144a36b0312c5000a006500810058","startTime":"Sep 26, 2020 10:58:52 AM","completionTime":"Sep 26, 2020 11:52:08 AM","checker":"bcc@9000"},

...

{"id":"4999","fileName":"267686238-73abc3c1-359e-4468-8355-d4e8da927661.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:12:06 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb668100c2e5c765000a006500810058","progress":"100|00000174cb668100c2e5c765000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb668100c2e5c765000a006500810058","startTime":"Sep 26, 2020 10:12:08 AM","completionTime":"Sep 26, 2020 10:37:55 AM","checker":"bcc@9000"}
]

The list may contain thousands of entries (events); each JSON could be spread over multiple lines and be nested - i.e. the above example isn't the only type of such list of JSONs we have to ingest.

What is the best practice to ingest this?

P.S. A more general question is, how does one ingest the following file format, with field extractions?

 

[
{"optional_timestamp": "2020-09-26 15:16", "field1": "value1"},
{"optional_timestamp": "2020-09-26 15:17", "field1": "value2"}
]

 

...assuming the file may contain thousands of events?

Thanks!

P.P.S. Fairly certain I've seen an answered question about this - but now I can't find it... Apologies for the duplicate...

Labels (1)
0 Karma
1 Solution

mitag
Contributor

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

View solution in original post

0 Karma

mitag
Contributor

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

0 Karma

vegerlandecs
Explorer

@mitag 

Because your file start with square brackets Splunk is probably thinking it's a single event. If you force it to understand that the line-breaking includes  a newline character followed by '{' it will ignore the '[]' as the first level.

The way to do this is to create a sourcetype with the right configs - field extraction done in search-time so you need this config in both forwarders, indexers and search heads.

 

[your_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\n\r]+)\{
TRUNCATE = 10000
TIME_PREFIX = \"optional_timestamp\"\s*:\s*\"
TIME_FORMAT = %Y-%m-%d %H:%M
MAX_TIMESTAMP_LOOKAHEAD = 16
KV_MODE=json

 

Note you will need to adjust the options as per the different logs types/formats.

See the line-breaking config test here:
https://regex101.com/r/66ufQK/1

mitag
Contributor

I included the line breaks around square brackets for readability - but it's actually not a given. The entire file may contain no line breaks - yet contain boatloads of events - or have boatloads of line breaks and events.

The only known part is this format: [{JSON1},{JSON2},...{JSON31459}] with unknown amount of white spacing within and between JSONs, including around square brackets.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...