Getting Data In

How to SEDCMD nested json calculated as string?

Aatom
Explorer

Hey Splunkers!

We have a large json event that has a Body Message, and BodyJson Message, a little redundant but this is what was provided. The immediate issue is the BodyJson.Message doesnt auto extract the JSON fields and it appears to be due to the doublequote before/after the curly brace in the Message object, and also the backslash escaping the doublequotes in the KV pairs. If I remove them from the upload the data extracts completely, but I havent found a good sedcmd to modify just this section of the event, without breaking the rest of the event. Please Help!

"Message": "{\"version\":\"0\",\"id\":\"5d3f\"...]}}" (need to sedcmd)

_raw (obfuscated)

{"MessageId": "eff1", "ReceiptHandle": "gw6", "MD5OfBody": "41a8a", "Body": "{\n \"Type\" : \"Notification\",\n \"MessageId\" : \"dafe\",\n \"TopicArn\" : \"arn:aws:sns:us-east\",\n \"Message\" : \"{\\\"version\\\":\\\"0\\\"}\",\n \"Timestamp\" : \"2021-01-26T04:30:22.756Z\",\n \"SignatureVersion\" : \"1\",\n \"Signature\" : \"Eqaf90pc\",\n \"SigningCertURL\" : \"https://sns.us-east-1.amazonaws.com\",\n \"UnsubscribeURL\" : \"https://sns.us-east-1.amazonaws.com\"\n}", "Attributes": {"SenderId": "AID", "ApproximateFirstReceiveTimestamp": "1611635422813", "ApproximateReceiveCount": "1", "SentTimestamp": "1611635422812"}, "BodyJson": {"Type": "Notification", "MessageId": "dafe", "TopicArn": "arn:aws:sns", "Message": "{\"version\":\"0\",\"id\":\"5d3f\",\"detail-type\":\"Findings\",\"source\":\"aws\",\"account\":\"54\",\"time\":\"2021-01-26T04:30:22Z\",\"region\":\"us-east-1\",\"resources\":[\"arn:aws\"]}}", "Timestamp": "2021-01-26T04:30:22.756Z", "SignatureVersion": "1", "Signature": "Eqaf90pcXJtL425k7", "SigningCertURL": "https://sns.us-east-1.amazonaws.com", "UnsubscribeURL": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east"}}

SEDCMD-test1 = s/ "Message": "{/ "Message": {/g

SEDCMD-test2 = s/}", "Timestamp/}, "Timestamp/g

SEDCMD-test3 = s/\\//g

 

Labels (1)
Tags (3)
0 Karma

vmulkowsky
Loves-to-Learn

Hello, did you ever find a resolution to this problem? I'm thinking about switching from SNS/SQS to a S3-based SQS solution instead, so that I do not have to attempt to extract the Message body from the SNS Notification envelope.  I'm thinking about sending the SNS event to a Lambda function that reads the Message body and writes it to S3 as a properly formated JSON structure (instead of escaped string).  And then have S3 trigger an SNS event to an SQS queue, and point Splunk to that SQS queue.  So basically, switching from SQS to S3-based SQS.

0 Karma

to4kawa
Ultra Champion

 

 

index=_internal | head 1 | eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message output=message
| spath input=message

 

 

How about this?

 

 

index=_internal | head 1 | fields _raw| eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message 
| rex field=BodyJson.Message max_match=0 "\"(?<key>\S+?)\":\[?\"(?<value>\S+?)\"\]?"
| eval _raw=mvzip(key,value,"=")| extract

 

Since you are able to extract it with regex, why don't you just write-meta in transforms.conf without SEDCMD?

Aatom
Explorer

@to4kawa  Thank you for the response! I know the community appreciates all your contributions!

That is an option, but this feed has upwards of 1400 dynamic fields across several data sources that aggregate into SecurityHub and then are pooled from AWS SQS, some fields are even in nested json lists. That would be alot of fields to index, but it may be our only realistic option.

Any additional ideas to fix the parsing behind the scenes? Currently the user has been provided a macro with spath and lots of renames.

 

Thanks!!

0 Karma

to4kawa
Ultra Champion

transforms.conf.spec

# Indexed field:

[netscreen-error]
REGEX =  device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1
WRITE_META = true
SOURCE_KEY = <string>

I'm sure you can get by with these things. 

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...