Getting Data In

ingesting a large list of JSONs as separate events?

mitag
Contributor

Have a list of JSONs that needs to be ingested as separate events (a separate event for each "id"):

[
{"id":"1","fileName":"267663776.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:56:32 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbfd0a7ba724bdbd000a006500810058","progress":"100|00000174cbfd0a7ba724bdbd000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbfd0a7ba724bdbd000a006500810058","startTime":"Sep 26, 2020 12:56:33 PM","completionTime":"Sep 26, 2020 1:45:20 PM","checker":"bcc@9000"},
{"id":"2","fileName":"267664759.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 12:36:51 PM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbeb047f5ab7565f000a006500810058","progress":"100|00000174cbeb047f5ab7565f000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbeb047f5ab7565f000a006500810058","startTime":"Sep 26, 2020 12:36:52 PM","completionTime":"Sep 26, 2020 1:16:00 PM","checker":"bcc@9000"},
{"id":"3","fileName":"267660544.mpg","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 11:52:22 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cbc24d2c370e7c19000a006500810058","progress":"100|00000174cbc24d2c370e7c19000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cbc24d2c370e7c19000a006500810058","startTime":"Sep 26, 2020 11:52:23 AM","completionTime":"Sep 26, 2020 12:16:40 PM","checker":"bcc@9000"},
{"id":"4","fileName":"267703040.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:58:49 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb9144a36b0312c5000a006500810058","progress":"100|00000174cb9144a36b0312c5000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb9144a36b0312c5000a006500810058","startTime":"Sep 26, 2020 10:58:52 AM","completionTime":"Sep 26, 2020 11:52:08 AM","checker":"bcc@9000"},

...

{"id":"4999","fileName":"267686238-73abc3c1-359e-4468-8355-d4e8da927661.ts","testPlan":"QC - TS Files (Partner A)","priority":"Normal","scheduledAt":"Sep 26, 2020 10:12:06 AM","status":"Finished","result":"Failure","correct":"correction completed|00000174cb668100c2e5c765000a006500810058","progress":"100|00000174cb668100c2e5c765000a006500810058","openInBaton":"https://bvm:443/Baton/@@home.html#Tasks/Report/00000174cb668100c2e5c765000a006500810058","startTime":"Sep 26, 2020 10:12:08 AM","completionTime":"Sep 26, 2020 10:37:55 AM","checker":"bcc@9000"}
]

The list may contain thousands of entries (events); each JSON could be spread over multiple lines and be nested - i.e. the above example isn't the only type of such list of JSONs we have to ingest.

What is the best practice to ingest this?

P.S. A more general question is, how does one ingest the following file format, with field extractions?

 

[
{"optional_timestamp": "2020-09-26 15:16", "field1": "value1"},
{"optional_timestamp": "2020-09-26 15:17", "field1": "value2"}
]

 

...assuming the file may contain thousands of events?

Thanks!

P.P.S. Fairly certain I've seen an answered question about this - but now I can't find it... Apologies for the duplicate...

Labels (1)
0 Karma
1 Solution

mitag
Contributor

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

View solution in original post

0 Karma

mitag
Contributor

Splunk will ingest this data type natively as long as it passes JSON validation. (Some tweaking may be needed, such to specify the fieldname of the timestamp.)

In my case, the JSON contained errors, did not pass JSON validation and thus could not be ingested by Splunk.

0 Karma

vegerlandecs
Explorer

@mitag 

Because your file start with square brackets Splunk is probably thinking it's a single event. If you force it to understand that the line-breaking includes  a newline character followed by '{' it will ignore the '[]' as the first level.

The way to do this is to create a sourcetype with the right configs - field extraction done in search-time so you need this config in both forwarders, indexers and search heads.

 

[your_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\n\r]+)\{
TRUNCATE = 10000
TIME_PREFIX = \"optional_timestamp\"\s*:\s*\"
TIME_FORMAT = %Y-%m-%d %H:%M
MAX_TIMESTAMP_LOOKAHEAD = 16
KV_MODE=json

 

Note you will need to adjust the options as per the different logs types/formats.

See the line-breaking config test here:
https://regex101.com/r/66ufQK/1

mitag
Contributor

I included the line breaks around square brackets for readability - but it's actually not a given. The entire file may contain no line breaks - yet contain boatloads of events - or have boatloads of line breaks and events.

The only known part is this format: [{JSON1},{JSON2},...{JSON31459}] with unknown amount of white spacing within and between JSONs, including around square brackets.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...