Hello @blzaxe,
The best way would be to preprocess with a modular input or some kinda of script. If thats not an option you are going to need to use index time transforms withs some additional props. I am guessing the data you want to split in to multiple events is everything contained within :
{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [
I am also assuming its a single line event or is it pretty printed. I let you figure that out, but for this example I am going believe your event looks is a single line like this {"about": "http://www.appledaily.com.tw","posts": {"data": [
Step one create transforms to strip out the outer json body
[removeOuterBody1]
# regex captures outer envelop/message container
REGEX = ^({[^\n]+data\":\s\[)([^\n]+)
FORMAT = $2
DEST_KEY = _raw
[removeOuterBody1]
# regex captures begining envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw
[removeOuterBody2]
# regex captures end envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw
Now you need to apply these to your props.
[CustomSourcetype]
TRANSFORMS-cleanMsg = removeOuterBody1, removeOuterBody2
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ,\{"message":
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true
disabled = false
TZ = UTC
The unfortunate problem is that you will still end up with a comma in your broken events, but unfortunately each event still contains a comma which makes it invalid json. You could clean this up if you did all this pre-parsing an a HF and then used another transform to strip the comma at the begin of the event on the indexers.
... View more