Hi,
Below is sample json input I am getting from rest api:
{ [-]
IPRequestLog: [ [-]
{ [-]
access_key: test
id: 0ac03844-a374-4237-9172-a7af9122bed2
ip_address: 192.168.1.245
requested_on: 2015-07-28 06:47:48
source_ip: 49.248.183.29
}
{ [-]
access_key: test
id: 7b1f5f38-77d1-453e-8a9e-e33f206474ff
ip_address: 192.168.1.240
requested_on: 2015-07-28 06:47:54
source_ip: 49.248.183.29
}
{ [-]
access_key: test
id: 83c6724b-2017-42fa-9cba-5c256d8d502e
ip_address: 192.168.1.249
requested_on: 2015-07-28 06:47:51
source_ip: 49.248.183.29
}
]
}
Currently values within the arrays are clubbed into a single event and 1st timestamp value is recognised as event time. I tried adding the following in props.conf:
[source::source_name]
TIME_PREFIX = requested_on":"
MAX_TIMESTAMP_LOOKAHEAD = 1000
BREAK_ONLY_BEFORE_DATE = false
MUST_BREAK_AFTER = },{
Does anybody know to split array into separate events with respective timestamps (in this case requested_on)?
You can do this by manipulating the break liners and cleaning up the stuff that is not needed. For instance, assume your JSON string looks like this:
{
"IPRequestLog": [
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-a7af9122bed2",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:48",
"source_ip": "49.248.183.29"
},
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-e33f206474ff",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:54",
"source_ip": "49.248.183.29"
},
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-5c256d8d502e",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:51",
"source_ip": "49.248.183.29"
}
]
}
You can clean this up with this basic recipe:
# props.conf
[answers-1438103671]
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = (\{|\[\s+{)
MUST_BREAK_AFTER = (\}|\}\s+\])
SEDCMD-remove_header = s/(\{\s+.+?\[)//g
SEDCMD-remove_trailing_commas = s/\},/}/g
SEDCMD-remove_footer = s/\]\s+\}//g
TIME_PREFIX = \"requested_on\":\s+\"
Assume that your sourcetype is answers-1438103671. Your results should look like this:
@emiller42
This can be done on universal forwarder side ??
If I had to parse something like this coming from an API, I would probably write a modular input. That way you can use your language of choice to query the REST endpoint, pull the JSON, manipulate it into individual events, and send to splunk.
This is pretty advanced and requires some dev chops, but works very well. Trying to do this via conf files is likely going to be brittle.
EDIT: Had a try at parsing this, and came up with a working example (that appears to be similar to the below answer, although I prefer using line_breakers when possible) This only linebreaks on newline characters or commas not near a quote. (So commas between events) And it strips the outer portions of JSON where found.
NOTE: This assumes your JSON is actually coming in minified.
{"IPRequestLog":[{"access_key":"test","id":"0ac03844-a374-4237-9172-a7af9122bed2","ip_address":"192.168.1.245","requested_on":"2015-07-28 06:47:48","source_ip":"49.248.183.29"},{"access_key":"test","id":"7b1f5f38-77d1-453e-8a9e-e33f206474ff","ip_address":"192.168.1.240","requested_on":"2015-07-28 06:47:54","source_ip":"49.248.183.29"},{"access_key":"test","id":"83c6724b-2017-42fa-9cba-5c256d8d502e","ip_address":"192.168.1.249","requested_on":"2015-07-28 06:47:51","source_ip":"49.248.183.29"}]}
Props.conf:
[json_split]
SHOULD_LINEMERGE=false
LINE_BREAKER=((?<!"),|[\r\n]+)
SEDCMD-remove_prefix=s/{"IPRequestLog":\[//g
SEDCMD-remove_suffix=s/\]}//g
Hi,
Thanks for the solution, it works as expected. Only thing extra we get as events is the starting and ending braces of the JSON.
How do we overcome this?
Thanks
Shahid
You can do this by manipulating the break liners and cleaning up the stuff that is not needed. For instance, assume your JSON string looks like this:
{
"IPRequestLog": [
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-a7af9122bed2",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:48",
"source_ip": "49.248.183.29"
},
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-e33f206474ff",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:54",
"source_ip": "49.248.183.29"
},
{
"access_key": "test",
"id": "0ac03844-a374-4237-9172-5c256d8d502e",
"ip_address": "192.168.1.245",
"requested_on": "2015-07-28 06:47:51",
"source_ip": "49.248.183.29"
}
]
}
You can clean this up with this basic recipe:
# props.conf
[answers-1438103671]
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = (\{|\[\s+{)
MUST_BREAK_AFTER = (\}|\}\s+\])
SEDCMD-remove_header = s/(\{\s+.+?\[)//g
SEDCMD-remove_trailing_commas = s/\},/}/g
SEDCMD-remove_footer = s/\]\s+\}//g
TIME_PREFIX = \"requested_on\":\s+\"
Assume that your sourcetype is answers-1438103671. Your results should look like this:
I don't know if you're aware of it but you've just dug up a thread that's almost 7 years old.
Also, your problem is indeed similar, but different.
So you'll get a best chance of receiving help if you start a new thread (you can post a link to this threaf for reference if you did try something based on the solution presented here).
And paste your event sample in a preformated or code block so it stays properly indented - it's much more readable that way.
I am in need of this exact solution, except it appears to stop after the first match in a json string? How do i prevent that? for example if you look at the original post there are multiple "TEST" in that single json string. I need to break these out into three records... your response works but only on the first record. any ideas?
Just to clarify, did you want to do this at index time?
I'm told you have splunk parse json quite easily, I haven't tried but it's worth researching?
Yes, I want to do this at index time.
You would then need to SEDCMD in your props.conf to manipulate the data before the JSON transformation is done. See some readables here.
http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Anonymizedatausingconfigurationfiles
http://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html