Re: How to SEDCMD nested json calculated as string...

Aatom · ‎01-26-2021

Hey Splunkers!

We have a large json event that has a Body Message, and BodyJson Message, a little redundant but this is what was provided. The immediate issue is the BodyJson.Message doesnt auto extract the JSON fields and it appears to be due to the doublequote before/after the curly brace in the Message object, and also the backslash escaping the doublequotes in the KV pairs. If I remove them from the upload the data extracts completely, but I havent found a good sedcmd to modify just this section of the event, without breaking the rest of the event. Please Help!

"Message": "{\"version\":\"0\",\"id\":\"5d3f\"...]}}" (need to sedcmd)

_raw (obfuscated)

{"MessageId": "eff1", "ReceiptHandle": "gw6", "MD5OfBody": "41a8a", "Body": "{\n \"Type\" : \"Notification\",\n \"MessageId\" : \"dafe\",\n \"TopicArn\" : \"arn:aws:sns:us-east\",\n \"Message\" : \"{\\\"version\\\":\\\"0\\\"}\",\n \"Timestamp\" : \"2021-01-26T04:30:22.756Z\",\n \"SignatureVersion\" : \"1\",\n \"Signature\" : \"Eqaf90pc\",\n \"SigningCertURL\" : \"https://sns.us-east-1.amazonaws.com\",\n \"UnsubscribeURL\" : \"https://sns.us-east-1.amazonaws.com\"\n}", "Attributes": {"SenderId": "AID", "ApproximateFirstReceiveTimestamp": "1611635422813", "ApproximateReceiveCount": "1", "SentTimestamp": "1611635422812"}, "BodyJson": {"Type": "Notification", "MessageId": "dafe", "TopicArn": "arn:aws:sns", "Message": "{\"version\":\"0\",\"id\":\"5d3f\",\"detail-type\":\"Findings\",\"source\":\"aws\",\"account\":\"54\",\"time\":\"2021-01-26T04:30:22Z\",\"region\":\"us-east-1\",\"resources\":[\"arn:aws\"]}}", "Timestamp": "2021-01-26T04:30:22.756Z", "SignatureVersion": "1", "Signature": "Eqaf90pcXJtL425k7", "SigningCertURL": "https://sns.us-east-1.amazonaws.com", "UnsubscribeURL": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east"}}

SEDCMD-test1 = s/ "Message": "{/ "Message": {/g

SEDCMD-test2 = s/}", "Timestamp/}, "Timestamp/g

SEDCMD-test3 = s/\\//g

vmulkowsky · ‎05-28-2021

Hello, did you ever find a resolution to this problem? I'm thinking about switching from SNS/SQS to a S3-based SQS solution instead, so that I do not have to attempt to extract the Message body from the SNS Notification envelope. I'm thinking about sending the SNS event to a Lambda function that reads the Message body and writes it to S3 as a properly formated JSON structure (instead of escaped string). And then have S3 trigger an SNS event to an SQS queue, and point Splunk to that SQS queue. So basically, switching from SQS to S3-based SQS.

to4kawa · ‎01-27-2021

index=_internal | head 1 | eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message output=message
| spath input=message

How about this?

index=_internal | head 1 | fields _raw| eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message 
| rex field=BodyJson.Message max_match=0 "\"(?<key>\S+?)\":\[?\"(?<value>\S+?)\"\]?"
| eval _raw=mvzip(key,value,"=")| extract

Since you are able to extract it with regex, why don't you just write-meta in transforms.conf without SEDCMD?

Aatom · ‎01-27-2021

@to4kawa Thank you for the response! I know the community appreciates all your contributions!

That is an option, but this feed has upwards of 1400 dynamic fields across several data sources that aggregate into SecurityHub and then are pooled from AWS SQS, some fields are even in nested json lists. That would be alot of fields to index, but it may be our only realistic option.

Any additional ideas to fix the parsing behind the scenes? Currently the user has been provided a macro with spath and lots of renames.

Thanks!!

to4kawa · ‎01-27-2021

transforms.conf.spec

# Indexed field:

[netscreen-error]
REGEX =  device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1
WRITE_META = true

SOURCE_KEY = <string>

I'm sure you can get by with these things.

How to SEDCMD nested json calculated as string?

JSON

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Are you a member of the Splunk Community?

How to SEDCMD nested json calculated as string?

JSON

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0