Hello everyone,
need your support to parse below sample json, i want is
1. Only the fields from "activity_type" till "user_email"
2. Remove first lines before "activity_type"and last lines after "user_email"
3. Line should break at "activity_type"
4. TIME_PREFIX=event_time
i added below but doesn't work removing the lines and TIME_PREFIX
[ sample_json ]
BREAK_ONLY_BEFORE=\"activity_type":\s.+,
CHARSET=UTF-8
SHOULD_LINEMERGE=true
disabled=false
LINE_BREAKER=([\r\n]+)
TIME_PREFIX=event_time
SEDCMD-remove=s/^\{/g
Sample data:
{
"status": 0,
"message": "Request completed successfully",
"data": [
{
"activity_type": "login",
"associated_items": null,
"changed_values": null,
"event_time": 1733907370512,
"id": "XcDutJMBNXQ_Xwfn2wgV",
"ip_address": "x.x.x.x",
"is_impersonated_user": false,
"item": {
"user_email": "xyz@example.com"
},
"message": "User xyz@example.com logged in",
"object_id": 0,
"object_name": "",
"object_type": "session",
"source": "",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1.1 Safari/605.1.15",
"user_email": "xyz@example.com"
},
{
"activity_type": "export",
"associated_items": null,
"changed_values": null,
"event_time": 1732634960475,
"id": "bd0XaZMBH5U9RA7biWrq",
"ip_address": "",
"is_impersonated_user": false,
"item": null,
"message": "Incident Detail Report generated successfully",
"object_id": 0,
"object_name": "",
"object_type": "breach incident",
"source": "",
"user_agent": "",
"user_email": ""
},
{
"activity_type": "logout",
"associated_items": null,
"changed_values": null,
"event_time": 1732625563087,
"id": "jaGHaJMB-qVJqBPy_3IG",
"ip_address": "87.200.106.98",
"is_impersonated_user": false,
"item": {
"user_email": "xyz@example.com"
},
"message": "User xyz@example.com logged out",
"object_id": 0,
"object_name": "",
"object_type": "session",
"source": "",
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"user_email": "xyz@example.com"
}
],
"count": 33830,
"meta_info": {
"total_rows": 33830,
"row_count": 200,
"pagination": {
"pagination_id": ""
}
}
}
Hello Zubair,
I tested this on the sample data that you put and it seems to work. Give it a shot and tell me if it works for you 🙂
[json_test]
SHOULD_LINEMERGE=false
LINE_BREAKER=([,\r\n]+){
CHARSET=AUTO
TIME_PREFIX="event_time"\:\s
MAX_TIMESTAMP_LOOKAHEAD=13
SEDCMD-removestart=s/^{[\s\S]*?\s*\[//
SEDCMD-removeend=s/],\r\n"count[\s\S]*\r\n}//
kv_mode=json
Hello Zubair,
I tested this on the sample data that you put and it seems to work. Give it a shot and tell me if it works for you 🙂
[json_test]
SHOULD_LINEMERGE=false
LINE_BREAKER=([,\r\n]+){
CHARSET=AUTO
TIME_PREFIX="event_time"\:\s
MAX_TIMESTAMP_LOOKAHEAD=13
SEDCMD-removestart=s/^{[\s\S]*?\s*\[//
SEDCMD-removeend=s/],\r\n"count[\s\S]*\r\n}//
kv_mode=json
Hello Daniel,
A very big thank you 🙂 dear, I truly appreciate the time and effort you put in to resolve it. The settings worked like a charm..
I am able to parse timestamp and line break at "activity_type" using below settings, however facing challenge in removing the first lines and last lines and also i am not able to extract field/values i used TRANSFORMS still didn't work.
First lines:
{
"status": 0,
"message": "Request completed successfully",
"data": [
{
last lines:
"count": 33830,
"meta_info": {
"total_rows": 33830,
"row_count": 200,
"pagination": {
"pagination_id": ""
}
}
}
Current props.conf and transforms.conf
Props.
[sample_test]
BREAK_ONLY_BEFORE = \"activity_type":\s.+,
DATETIME_CONFIG =
LINE_BREAKER = \"activity_type":\s.+,
MAX_TIMESTAMP_LOOKAHEAD = 16
NO_BINARY_CHECK = true
TIME_FORMAT = %Y-%m-%dT%H:%M:%S
TIME_PREFIX = event_time
TZ = Europe/Istanbul
category = Custom
disabled = false
pulldown_type = true
TRANSFORMS-extraction = extract_field_value
BREAK_ONLY_BEFORE_DATE =
SHOULD_LINEMERGE = true
Transforms.
[extract_field_value]
REGEX = "([^"]+)":\s*"([^"]+)"
FORMAT = $1::$2
Firstly, if this "works", it must be by mistake. LINE_BREAKER must contain a capturing groups to find the breaker.
Secondly, don't use SHOULD_LINEMERGE=true. Unless you know why you shouldn't do it.
Thirdly, TIME_PREFIX should as closely match the prefix as possible so Splunk doesn't have to guess.
Fourthly, TRANSFORMS defines index-time extractions.
You could try to approach it with line breaker similar to yours and then trimming it with SEDCMD but it is a bad idea as a whole. Don't process structured data this way. Are you absolutely sure that your json structures will _always_ be rendered starting with this field? And they will always end with that another field? If so, then why are you using structured data?
Process your data with external tool before ingesting and split it properly using json-based logic, not plain regexes.
I don't think you can do this at ingest time (but happy to be proved wrong!), but you can parse and split out the elements of the data collection into separate events using spath and mvexpand
| spath data{} output=data
| mvexpand data
| table data