All Apps and Add-ons

Parsing out raw JSON with LINE BREAKER

DanielFordWA
Contributor

Hi,

I am having difficulty parsing out some raw JSON data. Each day Splunk is required to hit an API and pull back the previous days data. Splunk can connect and pull the data back without any issues, it's just the parsing causing me headaches.

A sample of the raw data is below. There are thousands of events for each day in the extract, two events are in the sample.

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{"SDPABCSALES2_20150112_001":{"clearingCode":"12345678","creationAction":"CustRequestedQuote","instrumentDescription":"Product 23","instrumentId":"123456","marketer":"","notional":"10000000","productType":"ABC","settlementDate":"20150112","side":"RECEIVE","state":"Done","tradeDate":"2015-01-12","tradeId":"SDPABCSALES2_20150112_001","type":"RFQ","updateTimeStamp":"2015-01-12 09:03:48","user":"tester2","userData":"Tester 31","userText":"TRADX - ANY","value":"1.7800000000000002"},"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}},"_links":{"self":{"href":"https://api-test.test.net/ABC/2015-01-12"}}}

About as close as I have got is configuring the props.conf and transforms.conf below. (Which I know will not get the desired result but its the closest I've got)

Props.conf

[pockdbapi2]
SHOULD_LINEMERGE = false
LINE_BREAKER = (\},)
REPORT-all = pockdbapi3tr
TRUNCATE = 0
MAX_EVENTS = 500000
TIME_PREFIX = ("updateTimeStamp":)
TIME_FORMAT = %Y-%m-%d %H:%M:%S

transforms.conf

[pockdbapi3tr]
DELIMS = ",", ":"

Which results in the below raw events.

The events are all split correctly apart from the first event. The timestamp is correct for each new event and all fields extract correctly.

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{"SDPABCSALES2_20150112_001":{"clearingCode":"12345678","creationAction":"CustRequestedQuote","instrumentDescription":"Product 23","instrumentId":"123456","marketer":"","notional":"10000000","productType":"ABC","settlementDate":"20150112","side":"RECEIVE","state":"Done","tradeDate":"2015-01-12","tradeId":"SDPABCSALES2_20150112_001","type":"RFQ","updateTimeStamp":"2015-01-12 09:03:48","user":"tester2","userData":"Tester 31","userText":"TRADX - ANY","value":"1.7800000000000002"

"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}

"_links":{"self":{"href":"https://api-test.test.net/ABC/2015-01-12"}}}

All events apart from the first event parse as I would like.

Please can anyone advise how I would split out the below from the first event?

{"source":"ABC-trade-tracker","identifierType":"ABC","identifier":"2015-01-12","propertyData":{

Do I need to move away from LINE_BREAKER to get the desired result?

Thanks,

Dan

0 Karma
1 Solution

DanielFordWA
Contributor

After going to a brilliant presentation by Damien Dallimore at the Splunk user group London I now understand the possibilities of using either the Response Handler in the REST add-on or the Protocol Data Inputs add-on.

View solution in original post

Damien_Dallimor
Ultra Champion

Example of a custom handler that you can put in responsehandlers.py and how to wire it up.

class MyCustomHandler:

    def __init__(self,**args):
        pass

    def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):       

        if response_type == "json":        
            output = json.loads(raw_response_output)
            for event in output["propertyData"]:
                print_xml_stream(json.dumps(event))

        else:
            print_xml_stream(raw_response_output)

alt text

DanielFordWA
Contributor

Thanks Damien. I'll try it out today.

0 Karma

DanielFordWA
Contributor

Hi Damien,

I get the below for each event...

"XABCA_A3o_0"

.. rather than the full event...

"XABCA_A3o_0":{"instrumentDescription":"product 38","instrumentId":"12131654","killTime":"","marketer":"","markets":"US, UK, AUS","notional":"55000000","notionalFill":"0","productType":"ABC","side":"Sell","state":"Error","timeInForce":"FAS","tradeDate":"2015-01-12","tradeId":"XABCA_A3o_0","type":"Limit order","updateTimeStamp":"2015-01-12 23:10:20","user":"tester3","userRole":"client","value":"0.78"}

but im on the right track so it shouldnt take long.

Thanks again!

0 Karma

rolltidega
New Member

Hey Daniel did you get this working? I am trying to do something similar

0 Karma

DanielFordWA
Contributor

After going to a brilliant presentation by Damien Dallimore at the Splunk user group London I now understand the possibilities of using either the Response Handler in the REST add-on or the Protocol Data Inputs add-on.

cpetterborg
SplunkTrust
SplunkTrust

Do no use LINE_BREAKER, as it will break your events up mid JSON string, which throws it all off. If each line is a single JSON string, then you don't need the LINE_BREAKER at all.

You shouldn't need the REPORT-all and transforms.conf at all.

If you JSON strings are multi-line events, then SHOULD_LINEMERGE should be set to true.

Set KV_MODE to json. This is what makes it all much easier.

This is what I would do in props.conf for files that are PURE JSON data:

[pockdbapi2]
KV_MODE = json
NO_BINARY_CHECK = 1
TRUNCATE = 0
SHOULD_LINEMERGE = true
TIME_PREFIX = "updateTimeStamp":"
MAX_TIMESTAMP_LOOKAHEAD = 2048
MAX_EVENTS = 1

Your timestamp should come out naturally because Splunk is smart enough to see this format, but you can add it if you like.

DanielFordWA
Contributor

Thanks for your comment. The JSON I get isn't perfect and I had issue getting exactly what I want from using the above method. I've decided to finally jump in a start writing response handers in python.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...