Getting Data In

How to SEDCMD nested json calculated as string?

Aatom
Explorer

Hey Splunkers!

We have a large json event that has a Body Message, and BodyJson Message, a little redundant but this is what was provided. The immediate issue is the BodyJson.Message doesnt auto extract the JSON fields and it appears to be due to the doublequote before/after the curly brace in the Message object, and also the backslash escaping the doublequotes in the KV pairs. If I remove them from the upload the data extracts completely, but I havent found a good sedcmd to modify just this section of the event, without breaking the rest of the event. Please Help!

"Message": "{\"version\":\"0\",\"id\":\"5d3f\"...]}}" (need to sedcmd)

_raw (obfuscated)

{"MessageId": "eff1", "ReceiptHandle": "gw6", "MD5OfBody": "41a8a", "Body": "{\n \"Type\" : \"Notification\",\n \"MessageId\" : \"dafe\",\n \"TopicArn\" : \"arn:aws:sns:us-east\",\n \"Message\" : \"{\\\"version\\\":\\\"0\\\"}\",\n \"Timestamp\" : \"2021-01-26T04:30:22.756Z\",\n \"SignatureVersion\" : \"1\",\n \"Signature\" : \"Eqaf90pc\",\n \"SigningCertURL\" : \"https://sns.us-east-1.amazonaws.com\",\n \"UnsubscribeURL\" : \"https://sns.us-east-1.amazonaws.com\"\n}", "Attributes": {"SenderId": "AID", "ApproximateFirstReceiveTimestamp": "1611635422813", "ApproximateReceiveCount": "1", "SentTimestamp": "1611635422812"}, "BodyJson": {"Type": "Notification", "MessageId": "dafe", "TopicArn": "arn:aws:sns", "Message": "{\"version\":\"0\",\"id\":\"5d3f\",\"detail-type\":\"Findings\",\"source\":\"aws\",\"account\":\"54\",\"time\":\"2021-01-26T04:30:22Z\",\"region\":\"us-east-1\",\"resources\":[\"arn:aws\"]}}", "Timestamp": "2021-01-26T04:30:22.756Z", "SignatureVersion": "1", "Signature": "Eqaf90pcXJtL425k7", "SigningCertURL": "https://sns.us-east-1.amazonaws.com", "UnsubscribeURL": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east"}}

SEDCMD-test1 = s/ "Message": "{/ "Message": {/g

SEDCMD-test2 = s/}", "Timestamp/}, "Timestamp/g

SEDCMD-test3 = s/\\//g

 

Labels (1)
Tags (3)
0 Karma

vmulkowsky
Loves-to-Learn

Hello, did you ever find a resolution to this problem? I'm thinking about switching from SNS/SQS to a S3-based SQS solution instead, so that I do not have to attempt to extract the Message body from the SNS Notification envelope.  I'm thinking about sending the SNS event to a Lambda function that reads the Message body and writes it to S3 as a properly formated JSON structure (instead of escaped string).  And then have S3 trigger an SNS event to an SQS queue, and point Splunk to that SQS queue.  So basically, switching from SQS to S3-based SQS.

0 Karma

to4kawa
Ultra Champion

 

 

index=_internal | head 1 | eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message output=message
| spath input=message

 

 

How about this?

 

 

index=_internal | head 1 | fields _raw| eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message 
| rex field=BodyJson.Message max_match=0 "\"(?<key>\S+?)\":\[?\"(?<value>\S+?)\"\]?"
| eval _raw=mvzip(key,value,"=")| extract

 

Since you are able to extract it with regex, why don't you just write-meta in transforms.conf without SEDCMD?

Aatom
Explorer

@to4kawa  Thank you for the response! I know the community appreciates all your contributions!

That is an option, but this feed has upwards of 1400 dynamic fields across several data sources that aggregate into SecurityHub and then are pooled from AWS SQS, some fields are even in nested json lists. That would be alot of fields to index, but it may be our only realistic option.

Any additional ideas to fix the parsing behind the scenes? Currently the user has been provided a macro with spath and lots of renames.

 

Thanks!!

0 Karma

to4kawa
Ultra Champion

transforms.conf.spec

# Indexed field:

[netscreen-error]
REGEX =  device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1
WRITE_META = true
SOURCE_KEY = <string>

I'm sure you can get by with these things. 

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...