All Apps and Add-ons

Parsing out raw JSON with LINE BREAKER

Contributor

Hi,

I am having difficulty parsing out some raw JSON data. Each day Splunk is required to hit an API and pull back the previous days data. Splunk can connect and pull the data back without any issues, it's just the parsing causing me headaches.

A sample of the raw data is below. There are thousands of events for each day in the extract, two events are in the sample.

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{"SDPABCSALES2_20150112_001":{"clearingCode":"12345678","creationAction":"CustRequestedQuote","instrumentDescription":"Product 23","instrumentId":"123456","marketer":"","notional":"10000000","productType":"ABC","settlementDate":"20150112","side":"RECEIVE","state":"Done","tradeDate":"2015-01-12","tradeId":"SDPABCSALES2_20150112_001","type":"RFQ","updateTimeStamp":"2015-01-12 09:03:48","user":"tester2","userData":"Tester 31","userText":"TRADX - ANY","value":"1.7800000000000002"},"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}},"_links":{"self":{"href":"https://api-test.test.net/ABC/2015-01-12"}}}

About as close as I have got is configuring the props.conf and transforms.conf below. (Which I know will not get the desired result but its the closest I've got)

Props.conf

[pockdbapi2]
SHOULD_LINEMERGE = false
LINE_BREAKER = (\},)
REPORT-all = pockdbapi3tr
TRUNCATE = 0
MAX_EVENTS = 500000
TIME_PREFIX = ("updateTimeStamp":)
TIME_FORMAT = %Y-%m-%d %H:%M:%S

transforms.conf

[pockdbapi3tr]
DELIMS = ",", ":"

Which results in the below raw events.

The events are all split correctly apart from the first event. The timestamp is correct for each new event and all fields extract correctly.

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{"SDPABCSALES2_20150112_001":{"clearingCode":"12345678","creationAction":"CustRequestedQuote","instrumentDescription":"Product 23","instrumentId":"123456","marketer":"","notional":"10000000","productType":"ABC","settlementDate":"20150112","side":"RECEIVE","state":"Done","tradeDate":"2015-01-12","tradeId":"SDPABCSALES2_20150112_001","type":"RFQ","updateTimeStamp":"2015-01-12 09:03:48","user":"tester2","userData":"Tester 31","userText":"TRADX - ANY","value":"1.7800000000000002"

"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}

"_links":{"self":{"href":"https://api-test.test.net/ABC/2015-01-12"}}}

All events apart from the first event parse as I would like.

Please can anyone advise how I would split out the below from the first event?

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{

Do I need to move away from LINE_BREAKER to get the desired result?

Thanks,

Dan

0 Karma
1 Solution

Contributor

After going to a brilliant presentation by Damien Dallimore at the Splunk user group London I now understand the possibilities of using either the Response Handler in the REST add-on or the Protocol Data Inputs add-on.

View solution in original post

Ultra Champion

Example of a custom handler that you can put in responsehandlers.py and how to wire it up.

class MyCustomHandler:

    def __init__(self,**args):
        pass

    def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):       

        if response_type == "json":        
            output = json.loads(raw_response_output)
            for event in output["propertyData"]:
                print_xml_stream(json.dumps(event))

        else:
            print_xml_stream(raw_response_output)

alt text

Contributor

Thanks Damien. I'll try it out today.

0 Karma

Contributor

Hi Damien,

I get the below for each event...

"XABCA_A3o_0"

.. rather than the full event...

"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}

but im on the right track so it shouldnt take long.

Thanks again!

0 Karma

New Member

Hey Daniel did you get this working? I am trying to do something similar

0 Karma

Contributor

After going to a brilliant presentation by Damien Dallimore at the Splunk user group London I now understand the possibilities of using either the Response Handler in the REST add-on or the Protocol Data Inputs add-on.

View solution in original post

SplunkTrust
SplunkTrust

Do no use LINE_BREAKER, as it will break your events up mid JSON string, which throws it all off. If each line is a single JSON string, then you don't need the LINE_BREAKER at all.

You shouldn't need the REPORT-all and transforms.conf at all.

If you JSON strings are multi-line events, then SHOULD_LINEMERGE should be set to true.

Set KV_MODE to json. This is what makes it all much easier.

This is what I would do in props.conf for files that are PURE JSON data:

[pockdbapi2]
KV_MODE = json
NO_BINARY_CHECK = 1
TRUNCATE = 0
SHOULD_LINEMERGE = true
TIME_PREFIX = "updateTimeStamp":"
MAX_TIMESTAMP_LOOKAHEAD = 2048
MAX_EVENTS = 1

Your timestamp should come out naturally because Splunk is smart enough to see this format, but you can add it if you like.

Contributor

Thanks for your comment. The JSON I get isn't perfect and I had issue getting exactly what I want from using the above method. I've decided to finally jump in a start writing response handers in python.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!