Getting Data In

How to split a json array into multiple events with separate timestamps?

p_gurav
Champion

Hi,
Below is sample json input I am getting from rest api:

{ [-] 
    IPRequestLog: [ [-] 
     { [-] 
        access_key:  test 
        id:  0ac03844-a374-4237-9172-a7af9122bed2 
        ip_address:  192.168.1.245 
        requested_on:  2015-07-28 06:47:48 
        source_ip:  49.248.183.29 
     } 
     { [-] 
        access_key:  test 
        id:  7b1f5f38-77d1-453e-8a9e-e33f206474ff 
        ip_address:  192.168.1.240 
        requested_on:  2015-07-28 06:47:54 
        source_ip:  49.248.183.29 
     } 
     { [-] 
        access_key:  test 
        id:  83c6724b-2017-42fa-9cba-5c256d8d502e 
        ip_address:  192.168.1.249 
        requested_on:  2015-07-28 06:47:51 
        source_ip:  49.248.183.29 
     } 
   ] 
}

Currently values within the arrays are clubbed into a single event and 1st timestamp value is recognised as event time. I tried adding the following in props.conf:

[source::source_name]
TIME_PREFIX = requested_on":"
MAX_TIMESTAMP_LOOKAHEAD = 1000
BREAK_ONLY_BEFORE_DATE = false
MUST_BREAK_AFTER = },{

Does anybody know to split array into separate events with respective timestamps (in this case requested_on)?

1 Solution

Gilberto_Castil
Splunk Employee
Splunk Employee

You can do this by manipulating the break liners and cleaning up the stuff that is not needed. For instance, assume your JSON string looks like this:

{
    "IPRequestLog": [
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-a7af9122bed2",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:48",
            "source_ip": "49.248.183.29"
        },
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-e33f206474ff",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:54",
            "source_ip": "49.248.183.29"
        },
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-5c256d8d502e",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:51",
            "source_ip": "49.248.183.29"
        }
    ]
}

You can clean this up with this basic recipe:

# props.conf
[answers-1438103671]
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = (\{|\[\s+{)
MUST_BREAK_AFTER = (\}|\}\s+\])
SEDCMD-remove_header = s/(\{\s+.+?\[)//g
SEDCMD-remove_trailing_commas = s/\},/}/g
SEDCMD-remove_footer = s/\]\s+\}//g
TIME_PREFIX = \"requested_on\":\s+\"

Assume that your sourcetype is answers-1438103671. Your results should look like this:

alt text

View solution in original post

vj5
New Member

@emiller42
This can be done on universal forwarder side ??

0 Karma

emiller42
Motivator

If I had to parse something like this coming from an API, I would probably write a modular input. That way you can use your language of choice to query the REST endpoint, pull the JSON, manipulate it into individual events, and send to splunk.

This is pretty advanced and requires some dev chops, but works very well. Trying to do this via conf files is likely going to be brittle.

Relevant Documentation

EDIT: Had a try at parsing this, and came up with a working example (that appears to be similar to the below answer, although I prefer using line_breakers when possible) This only linebreaks on newline characters or commas not near a quote. (So commas between events) And it strips the outer portions of JSON where found.

NOTE: This assumes your JSON is actually coming in minified.

{"IPRequestLog":[{"access_key":"test","id":"0ac03844-a374-4237-9172-a7af9122bed2","ip_address":"192.168.1.245","requested_on":"2015-07-28 06:47:48","source_ip":"49.248.183.29"},{"access_key":"test","id":"7b1f5f38-77d1-453e-8a9e-e33f206474ff","ip_address":"192.168.1.240","requested_on":"2015-07-28 06:47:54","source_ip":"49.248.183.29"},{"access_key":"test","id":"83c6724b-2017-42fa-9cba-5c256d8d502e","ip_address":"192.168.1.249","requested_on":"2015-07-28 06:47:51","source_ip":"49.248.183.29"}]}

Props.conf:

[json_split]
SHOULD_LINEMERGE=false
LINE_BREAKER=((?<!"),|[\r\n]+)
SEDCMD-remove_prefix=s/{"IPRequestLog":\[//g
SEDCMD-remove_suffix=s/\]}//g

alt text

shahid285
Path Finder

Hi,
Thanks for the solution, it works as expected. Only thing extra we get as events is the starting and ending braces of the JSON.
How do we overcome this?

Thanks
Shahid

0 Karma

Gilberto_Castil
Splunk Employee
Splunk Employee

You can do this by manipulating the break liners and cleaning up the stuff that is not needed. For instance, assume your JSON string looks like this:

{
    "IPRequestLog": [
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-a7af9122bed2",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:48",
            "source_ip": "49.248.183.29"
        },
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-e33f206474ff",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:54",
            "source_ip": "49.248.183.29"
        },
        {
            "access_key": "test",
            "id": "0ac03844-a374-4237-9172-5c256d8d502e",
            "ip_address": "192.168.1.245",
            "requested_on": "2015-07-28 06:47:51",
            "source_ip": "49.248.183.29"
        }
    ]
}

You can clean this up with this basic recipe:

# props.conf
[answers-1438103671]
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = (\{|\[\s+{)
MUST_BREAK_AFTER = (\}|\}\s+\])
SEDCMD-remove_header = s/(\{\s+.+?\[)//g
SEDCMD-remove_trailing_commas = s/\},/}/g
SEDCMD-remove_footer = s/\]\s+\}//g
TIME_PREFIX = \"requested_on\":\s+\"

Assume that your sourcetype is answers-1438103671. Your results should look like this:

alt text

AnilPujar
Path Finder
Can somebody please help me with the props, the below data need to break into 3 events
 
{
"retrRecResp":[
{
"keyFields":[
{
"key":"Domain",
"value": "login"
},
{
"key":"Env",
"value": "Prod"
}
],
"payload" : {
"payloadDataObject":{},
"timestamp":"Wed Jan 20 21:42:28 UTC 2021"
},
"consumerId":"Splunk",
"entityState": "Default"
},
{
"keyFields":[
{
"key":"Domain",
"value": "login"
},
{
"key":"Env",
"value": "SIT"
}
],
"payload" : {
"payloadDataObject":{},
"timestamp":"Wed Jan 20 21:42:28 UTC 2021"
},
"consumerId":"Splunk",
"entityState": "Default"
},
{
"keyFields":[
{
"key":"Domain",
"value": "login"
},
{
"key":"Env",
"value": "uat"
}
],
"payload" : {
"payloadDataObject":{},
"timestamp":"Wed Feb 20 21:42:28 UTC 2021"
},
"consumerId":"Splunk",
"entityState": "Default"
}
]
}
0 Karma

AnilPujar
Path Finder
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't know if you're aware of it but you've just dug up a thread that's almost 7 years old.

Also, your problem is indeed similar, but different.

So you'll get a best chance of receiving help if you start a new thread (you can post a link to this threaf for reference if you did try something based on the solution presented here).

And paste your event sample in a preformated or code block so it stays properly indented - it's much more readable that way.

0 Karma

pir8radio
Path Finder

I am in need of this exact solution, except it appears to stop after the first match in a json string?  How do i prevent that?   for example if you look at the original post there are multiple "TEST" in that single json string.  I need to break these out into three records...  your response works but only on the first record.  any ideas?

0 Karma

pwmcity
Path Finder

Just to clarify, did you want to do this at index time?
I'm told you have splunk parse json quite easily, I haven't tried but it's worth researching?

0 Karma

p_gurav
Champion

Yes, I want to do this at index time.

0 Karma

somesoni2
Revered Legend

You would then need to SEDCMD in your props.conf to manipulate the data before the JSON transformation is done. See some readables here.
http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Anonymizedatausingconfigurationfiles
http://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...