Deployment Architecture

How to split this json become mutiple events?

blzaxe
New Member

Hello

I'm trying to split a Json file from FaceBook Graph API into multiple Events in the props.conf

Here is the json simple:

{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [
{
"message": "first post message",
"created_time": "2016-11-01T11:20:01+0000",
"id": "232633627068_10155237456442069",
"likes": {
"data": [
{
"id": "125823837756509",
"name": "XXX"
},
{
"id": "125547431150532",
"name": "OOO"
}
],
"paging": {
"cursors": {
"before": "MTI1ODIzODM3NzU2NTA5",
"after": "Nzk0NDQzNDAzOTEyNjc3"
}
}
}
},
{
"message": "other messages",
"created_time": "2016-11-01T11:10:00+0000",
"id": "232633627068_10155237171047069",
"likes": {
"data": [
{
"id": "434788333331603",
"name": "AA"
},
{
"id": "1485443865001594",
"name": "BB"
}
],
"paging": {
"cursors": {
"before": "NDM0Nzg4MzMzMzMxNjAz",
"after": "NjA4NDc4NTY5MjU5ODQ1"
}
}
}
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr...",
"next": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr..."
}
},
"id": "232633627068"
}

This is my props.conf setting:

[_json]
INDEXED_EXTRACTIONS = json
KV_MODE = JSON
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^{
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true
disabled = false
TZ = UTC

What should the props.conf look like to split such a file to become multiple Events?
or input the file then used spath to to split event ?
thank you for your suggestions.

Tags (1)
0 Karma

bmacias84
Champion

Hello @blzaxe,

The best way would be to preprocess with a modular input or some kinda of script. If thats not an option you are going to need to use index time transforms withs some additional props. I am guessing the data you want to split in to multiple events is everything contained within :

{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [

I am also assuming its a single line event or is it pretty printed. I let you figure that out, but for this example I am going believe your event looks is a single line like this {"about": "http://www.appledaily.com.tw","posts": {"data": [

Step one create transforms to strip out the outer json body

[removeOuterBody1]
# regex captures outer envelop/message container
REGEX = ^({[^\n]+data\":\s\[)([^\n]+)
FORMAT = $2
DEST_KEY = _raw

[removeOuterBody1]
# regex captures begining envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw

[removeOuterBody2]
# regex captures end envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw

Now you need to apply these to your props.

[CustomSourcetype]
TRANSFORMS-cleanMsg = removeOuterBody1, removeOuterBody2
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE =  ,\{"message":
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true 
disabled = false
TZ = UTC

The unfortunate problem is that you will still end up with a comma in your broken events, but unfortunately each event still contains a comma which makes it invalid json. You could clean this up if you did all this pre-parsing an a HF and then used another transform to strip the comma at the begin of the event on the indexers.

0 Karma

blzaxe
New Member

Excuse me! I put transforms.conf in \etc\apps\app_names\local
why it can't do?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...