Getting Data In

How to configure Splunk to parse and index JSON data

berryk
New Member

I got a custom-crafted JSON file that holds a mix of data types within. I'm a newbie with Splunk administration so bear with me.

This is a valid JSON, as far as I understand I need to define a new link break definition with regex to help Splunk parse and index this data correctly with all fields. I minified the file and uploaded it after verifying that my regex actually match:

alt text

Can you assist what could be a good regex definition? Below is a snippet from file I want to parse; there should be 2 events in there:

{"data":[{"serial":[0],"_score":null,"_type":"winevtx","_index":"xxx","_id":"xxx","_source":{"process_id":48,"message":"","provider_guid":"xxx","log_name":"Security","source_name":"Microsoft-Windows-Security-Auditing","event_data":{"TicketOptions":"xxx","TargetUserName":"xxx","ServiceName":"krbtgt","IpAddress":"::ffff:","TargetDomainName":"xxx","IpPort":"53782","TicketEncryptionType":"0x12","LogonGuid":"xxx","TransmittedServices":"-","Status":"0x0","ServiceSid":"xxx"},"beat":{"name":"xxx","version":"5.2.2","hostname":"xxx"},"thread_id":1016,"@version":"1","@metadata":{"index_local_timestamp":"2019-07-20T06:27:21.23323","hostname":"xxxDC","index_utc_timestamp":"2019-07-20T06:27:21.23323","timezone":"UTC+0000"},"opcode":"Info","@timestamp":"2019-07-20T06:25:33.801Z","tags":["beats_input_codec_plain_applied"],"type":"wineventlog","computer_name":"xxx","event_id":4769,"record_number":"198","level":"Information","keywords":["Audit Success"],"host":"xxx","task":"Kerberos Service Ticket Operations"}},{"serial":[1],"_score":null,"_type":"winevtx","_index":"xxx-xxx","_id":"==","_source":{"event_data":{"SubjectDomainName":"-","LogonType":"3","LogonGuid":"{xxx}","SubjectUserSid":"S-1-0-0","LogonProcessName":"Kerberos","TargetDomainName":"xxx","AuthenticationPackageName":"Kerberos","ProcessName":"-","SubjectLogonId":"0x0","TargetUserName":"xxx","ProcessId":"0x0","TargetLogonId":"","IpAddress":"::1","LmPackageName":"-","ImpersonationLevel":"%%1833","IpPort":"0","SubjectUserName":"-","TargetUserSid":"S-1-5-18","KeyLength":"0","TransmittedServices":"-"},"provider_guid":"{xxx}","beat":{"name":"xxx","version":"5.2.2","hostname":"xxx"},"@metadata":{"index_local_timestamp":"2019-07-20T06:34:21.23323","hostname":"xxx","index_utc_timestamp":"2019-07-20T06:34:21.23323","timezone":"UTC+0000"},"opcode":"Info","@timestamp":"2019-07-20T06:33:40.262Z","thread_id":52,"event_id":4624,"record_number":"123","level":"Information","log_name":"Security","source_name":"Microsoft-Windows-Security-Auditing","@version":"1","process_id":48,"host":"xxx","type":"wineventlog","computer_name":"xxx","version":1,"tags":["beats_input_codec_plain_applied"],"keywords":["Audit Success"],"task":"Logon","message":""}}]}

Berry

0 Karma

jhomerlopez
Explorer

You can use the below sourcetype. (Or the default pretrained "json" sourcetype)

[data_json]
pulldown_type = true
INDEXED_EXTRACTIONS = json
KV_MODE = none
category = Structured

0 Karma

berryk
New Member

answer
Thanks mate.
I tried to use the default json sourcetype with no success. Seems like something else should be used to help Splunk digest it. I believe I need to configure the break liner but not sure what the value should be. Any ideas?

0 Karma

to4kawa
Ultra Champion
|makeresults
|eval _raw="{\"data\":[{\"serial\":[0],\"_score\":null,\"_type\":\"winevtx\",\"_index\":\"xxx\",\"_id\":\"xxx\",\"_source\":{\"process_id\":48,\"message\":\"\",\"provider_guid\":\"xxx\",\"log_name\":\"Security\",\"source_name\":\"Microsoft-Windows-Security-Auditing\",\"event_data\":{\"TicketOptions\":\"xxx\",\"TargetUserName\":\"xxx\",\"ServiceName\":\"krbtgt\",\"IpAddress\":\"::ffff:\",\"TargetDomainName\":\"xxx\",\"IpPort\":\"53782\",\"TicketEncryptionType\":\"0x12\",\"LogonGuid\":\"xxx\",\"TransmittedServices\":\"-\",\"Status\":\"0x0\",\"ServiceSid\":\"xxx\"},\"beat\":{\"name\":\"xxx\",\"version\":\"5.2.2\",\"hostname\":\"xxx\"},\"thread_id\":1016,\"@version\":\"1\",\"@metadata\":{\"index_local_timestamp\":\"2019-07-20T06:27:21.23323\",\"hostname\":\"xxxDC\",\"index_utc_timestamp\":\"2019-07-20T06:27:21.23323\",\"timezone\":\"UTC+0000\"},\"opcode\":\"Info\",\"@timestamp\":\"2019-07-20T06:25:33.801Z\",\"tags\":[\"beats_input_codec_plain_applied\"],\"type\":\"wineventlog\",\"computer_name\":\"xxx\",\"event_id\":4769,\"record_number\":\"198\",\"level\":\"Information\",\"keywords\":[\"Audit Success\"],\"host\":\"xxx\",\"task\":\"Kerberos Service Ticket Operations\"}},{\"serial\":[1],\"_score\":null,\"_type\":\"winevtx\",\"_index\":\"xxx-xxx\",\"_id\":\"==\",\"_source\":{\"event_data\":{\"SubjectDomainName\":\"-\",\"LogonType\":\"3\",\"LogonGuid\":\"{xxx}\",\"SubjectUserSid\":\"S-1-0-0\",\"LogonProcessName\":\"Kerberos\",\"TargetDomainName\":\"xxx\",\"AuthenticationPackageName\":\"Kerberos\",\"ProcessName\":\"-\",\"SubjectLogonId\":\"0x0\",\"TargetUserName\":\"xxx\",\"ProcessId\":\"0x0\",\"TargetLogonId\":\"\",\"IpAddress\":\"::1\",\"LmPackageName\":\"-\",\"ImpersonationLevel\":\"%%1833\",\"IpPort\":\"0\",\"SubjectUserName\":\"-\",\"TargetUserSid\":\"S-1-5-18\",\"KeyLength\":\"0\",\"TransmittedServices\":\"-\"},\"provider_guid\":\"{xxx}\",\"beat\":{\"name\":\"xxx\",\"version\":\"5.2.2\",\"hostname\":\"xxx\"},\"@metadata\":{\"index_local_timestamp\":\"2019-07-20T06:34:21.23323\",\"hostname\":\"xxx\",\"index_utc_timestamp\":\"2019-07-20T06:34:21.23323\",\"timezone\":\"UTC+0000\"},\"opcode\":\"Info\",\"@timestamp\":\"2019-07    -20T06:33:40.262Z\",\"thread_id\":52,\"event_id\":4624,\"record_number\":\"123\",\"level\":\"Information\",\"log_name\":\"Security\",\"source_name\":\"Microsoft-Windows-Security-Auditing\",\"@version\":\"1\",\"process_id\":48,\"host\":\"xxx\",\"type\":\"wineventlog\",\"computer_name\":\"xxx\",\"version\":1,\"tags\":[\"beats_input_codec_plain_applied\"],\"keywords\":[\"Audit Success\"],\"task\":\"Logon\",\"message\":\"\"}}]}"
| spath

props.conf

0 Karma

berryk
New Member

Thanks for that. But I'm looking for parsing during index. I've imported the json and now need to parse it somehow so Splunk could digest it correctly.
what I attached here is just a small snippet for example

0 Karma

to4kawa
Ultra Champion

props.conf

0 Karma

berryk
New Member

Just edited my question, hope it's more clear now

0 Karma

to4kawa
Ultra Champion
KV_MODE = json

your question is corrected and spathworks fine, basically this setting is work.
If you modify conf, you must restart splunk.

0 Karma

berryk
New Member

I'll try to be more precise - I know that I need to configure props.conf (or the sourcetype during data import) but not sure how - what is the right regex syntax? in the example above there are 2 distinct events. When I chose json as sourcetype the data is not shown as expected (not all fields are parsed), probably because of the "serial" array. I read few questions but none was the same case as I have here

Berry

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...