Getting Data In

json parsing

zubairsp
Explorer

Hello everyone, 

need your support to parse below sample json, i want is 

1. Only the fields from "activity_type" till "user_email"

2. Remove first lines before "activity_type"and last lines after "user_email"

3. Line should break at "activity_type"

4. TIME_PREFIX=event_time

 

i added below but doesn't work removing the lines and TIME_PREFIX 
[ sample_json ]
BREAK_ONLY_BEFORE=\"activity_type":\s.+,
CHARSET=UTF-8
SHOULD_LINEMERGE=true
disabled=false
LINE_BREAKER=([\r\n]+)
TIME_PREFIX=event_time
SEDCMD-remove=s/^\{/g

 

Sample data:

{
"status": 0,
"message": "Request completed successfully",
"data": [
{
"activity_type": "login",
"associated_items": null,
"changed_values": null,
"event_time": 1733907370512,
"id": "XcDutJMBNXQ_Xwfn2wgV",
"ip_address": "x.x.x.x",
"is_impersonated_user": false,
"item": {
"user_email": "xyz@example.com"
},
"message": "User xyz@example.com logged in",
"object_id": 0,
"object_name": "",
"object_type": "session",
"source": "",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1.1 Safari/605.1.15",
"user_email": "xyz@example.com"
},
{
"activity_type": "export",
"associated_items": null,
"changed_values": null,
"event_time": 1732634960475,
"id": "bd0XaZMBH5U9RA7biWrq",
"ip_address": "",
"is_impersonated_user": false,
"item": null,
"message": "Incident Detail Report generated successfully",
"object_id": 0,
"object_name": "",
"object_type": "breach incident",
"source": "",
"user_agent": "",
"user_email": ""
},
{
"activity_type": "logout",
"associated_items": null,
"changed_values": null,
"event_time": 1732625563087,
"id": "jaGHaJMB-qVJqBPy_3IG",
"ip_address": "87.200.106.98",
"is_impersonated_user": false,
"item": {
"user_email": "xyz@example.com"
},
"message": "User xyz@example.com logged out",
"object_id": 0,
"object_name": "",
"object_type": "session",
"source": "",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"user_email": "xyz@example.com"
}
],
"count": 33830,
"meta_info": {
"total_rows": 33830,
"row_count": 200,
"pagination": {
"pagination_id": ""
}
}
}

Labels (1)
0 Karma
1 Solution

DanielHou
Engager

Hello Zubair,

I tested this on the sample data that you put and it seems to work. Give it a shot and tell me if it works for you 🙂 

[json_test]
SHOULD_LINEMERGE=false
LINE_BREAKER=([,\r\n]+){
CHARSET=AUTO
TIME_PREFIX="event_time"\:\s
MAX_TIMESTAMP_LOOKAHEAD=13
SEDCMD-removestart=s/^{[\s\S]*?\s*\[//
SEDCMD-removeend=s/],\r\n"count[\s\S]*\r\n}//
kv_mode=json

View solution in original post

DanielHou
Engager

Hello Zubair,

I tested this on the sample data that you put and it seems to work. Give it a shot and tell me if it works for you 🙂 

[json_test]
SHOULD_LINEMERGE=false
LINE_BREAKER=([,\r\n]+){
CHARSET=AUTO
TIME_PREFIX="event_time"\:\s
MAX_TIMESTAMP_LOOKAHEAD=13
SEDCMD-removestart=s/^{[\s\S]*?\s*\[//
SEDCMD-removeend=s/],\r\n"count[\s\S]*\r\n}//
kv_mode=json

zubairsp
Explorer

Hello Daniel,

A very big thank you 🙂 dear, I truly appreciate the time and effort you put in to resolve it. The settings worked like a charm..

0 Karma

zubairsp
Explorer

I am able to parse timestamp and line break at "activity_type" using below settings, however facing challenge in removing the first lines and last lines and also i am not able to extract field/values i used TRANSFORMS still didn't work.

First lines:

{
"status": 0,
"message": "Request completed successfully",
"data": [
{

last lines:

"count": 33830,
"meta_info": {
"total_rows": 33830,
"row_count": 200,
"pagination": {
"pagination_id": ""
}
}
}

Current props.conf and transforms.conf

Props.

[sample_test]
BREAK_ONLY_BEFORE = \"activity_type":\s.+,
DATETIME_CONFIG =
LINE_BREAKER = \"activity_type":\s.+,
MAX_TIMESTAMP_LOOKAHEAD = 16
NO_BINARY_CHECK = true
TIME_FORMAT = %Y-%m-%dT%H:%M:%S
TIME_PREFIX = event_time
TZ = Europe/Istanbul
category = Custom
disabled = false
pulldown_type = true
TRANSFORMS-extraction = extract_field_value
BREAK_ONLY_BEFORE_DATE =
SHOULD_LINEMERGE = true

Transforms.

[extract_field_value]
REGEX = "([^"]+)":\s*"([^"]+)"
FORMAT = $1::$2

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Firstly, if this "works", it must be by mistake. LINE_BREAKER must contain a capturing groups to find the breaker.

Secondly, don't use SHOULD_LINEMERGE=true. Unless you know why you shouldn't do it.

Thirdly, TIME_PREFIX should as closely match the prefix as possible so Splunk doesn't have to guess.

Fourthly, TRANSFORMS defines index-time extractions.

You could try to approach it with  line breaker similar to yours and then trimming it with SEDCMD but it is a bad idea as a whole. Don't process structured data this way. Are you absolutely sure that your json structures will _always_ be rendered starting with this field? And they will always end with that another field? If so, then why are you using structured data?

Process your data with external tool before ingesting and split it properly using json-based logic, not plain regexes.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I don't think you can do this at ingest time (but happy to be proved wrong!), but you can parse and split out the elements of the data collection into separate events using spath and mvexpand

| spath data{} output=data
| mvexpand data
| table data
0 Karma
Get Updates on the Splunk Community!

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...