Hey Splunkers!
We have a large json event that has a Body Message, and BodyJson Message, a little redundant but this is what was provided. The immediate issue is the BodyJson.Message doesnt auto extract the JSON fields and it appears to be due to the doublequote before/after the curly brace in the Message object, and also the backslash escaping the doublequotes in the KV pairs. If I remove them from the upload the data extracts completely, but I havent found a good sedcmd to modify just this section of the event, without breaking the rest of the event. Please Help!
"Message": "{\"version\":\"0\",\"id\":\"5d3f\"...]}}" (need to sedcmd)
_raw (obfuscated)
{"MessageId": "eff1", "ReceiptHandle": "gw6", "MD5OfBody": "41a8a", "Body": "{\n \"Type\" : \"Notification\",\n \"MessageId\" : \"dafe\",\n \"TopicArn\" : \"arn:aws:sns:us-east\",\n \"Message\" : \"{\\\"version\\\":\\\"0\\\"}\",\n \"Timestamp\" : \"2021-01-26T04:30:22.756Z\",\n \"SignatureVersion\" : \"1\",\n \"Signature\" : \"Eqaf90pc\",\n \"SigningCertURL\" : \"https://sns.us-east-1.amazonaws.com\",\n \"UnsubscribeURL\" : \"https://sns.us-east-1.amazonaws.com\"\n}", "Attributes": {"SenderId": "AID", "ApproximateFirstReceiveTimestamp": "1611635422813", "ApproximateReceiveCount": "1", "SentTimestamp": "1611635422812"}, "BodyJson": {"Type": "Notification", "MessageId": "dafe", "TopicArn": "arn:aws:sns", "Message": "{\"version\":\"0\",\"id\":\"5d3f\",\"detail-type\":\"Findings\",\"source\":\"aws\",\"account\":\"54\",\"time\":\"2021-01-26T04:30:22Z\",\"region\":\"us-east-1\",\"resources\":[\"arn:aws\"]}}", "Timestamp": "2021-01-26T04:30:22.756Z", "SignatureVersion": "1", "Signature": "Eqaf90pcXJtL425k7", "SigningCertURL": "https://sns.us-east-1.amazonaws.com", "UnsubscribeURL": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east"}}
SEDCMD-test1 = s/ "Message": "{/ "Message": {/g
SEDCMD-test2 = s/}", "Timestamp/}, "Timestamp/g
SEDCMD-test3 = s/\\//g
Hello, did you ever find a resolution to this problem? I'm thinking about switching from SNS/SQS to a S3-based SQS solution instead, so that I do not have to attempt to extract the Message body from the SNS Notification envelope. I'm thinking about sending the SNS event to a Lambda function that reads the Message body and writes it to S3 as a properly formated JSON structure (instead of escaped string). And then have S3 trigger an SNS event to an SQS queue, and point Splunk to that SQS queue. So basically, switching from SQS to S3-based SQS.
index=_internal | head 1 | eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message output=message
| spath input=message
How about this?
index=_internal | head 1 | fields _raw| eval _raw="{\"MessageId\":\"eff1\",\"ReceiptHandle\":\"gw6\",\"MD5OfBody\":\"41a8a\",\"Body\":\"{\\n \\\"Type\\\" : \\\"Notification\\\",\\n \\\"MessageId\\\" : \\\"dafe\\\",\\n \\\"TopicArn\\\" : \\\"arn:aws:sns:us-east\\\",\\n \\\"Message\\\" : \\\"{\\\\\\\"version\\\\\\\":\\\\\\\"0\\\\\\\"}\\\",\\n \\\"Timestamp\\\" : \\\"2021-01-26T04:30:22.756Z\\\",\\n \\\"SignatureVersion\\\" : \\\"1\\\",\\n \\\"Signature\\\" : \\\"Eqaf90pc\\\",\\n \\\"SigningCertURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\",\\n \\\"UnsubscribeURL\\\" : \\\"https://sns.us-east-1.amazonaws.com\\\"\\n}\",\"Attributes\":{\"SenderId\":\"AID\",\"ApproximateFirstReceiveTimestamp\":\"1611635422813\",\"ApproximateReceiveCount\":\"1\",\"SentTimestamp\":\"1611635422812\"},\"BodyJson\":{\"Type\":\"Notification\",\"MessageId\":\"dafe\",\"TopicArn\":\"arn:aws:sns\",\"Message\":\"{\\\"version\\\":\\\"0\\\",\\\"id\\\":\\\"5d3f\\\",\\\"detail-type\\\":\\\"Findings\\\",\\\"source\\\":\\\"aws\\\",\\\"account\\\":\\\"54\\\",\\\"time\\\":\\\"2021-01-26T04:30:22Z\\\",\\\"region\\\":\\\"us-east-1\\\",\\\"resources\\\":[\\\"arn:aws\\\"]}}\",\"Timestamp\":\"2021-01-26T04:30:22.756Z\",\"SignatureVersion\":\"1\",\"Signature\":\"Eqaf90pcXJtL425k7\",\"SigningCertURL\":\"https://sns.us-east-1.amazonaws.com\",\"UnsubscribeURL\":\"https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east\"}}"
| spath BodyJson.Message
| rex field=BodyJson.Message max_match=0 "\"(?<key>\S+?)\":\[?\"(?<value>\S+?)\"\]?"
| eval _raw=mvzip(key,value,"=")| extract
Since you are able to extract it with regex, why don't you just write-meta in transforms.conf without SEDCMD?
@to4kawa Thank you for the response! I know the community appreciates all your contributions!
That is an option, but this feed has upwards of 1400 dynamic fields across several data sources that aggregate into SecurityHub and then are pooled from AWS SQS, some fields are even in nested json lists. That would be alot of fields to index, but it may be our only realistic option.
Any additional ideas to fix the parsing behind the scenes? Currently the user has been provided a macro with spath and lots of renames.
Thanks!!
# Indexed field: [netscreen-error] REGEX = device_id=\[w+\](?<err_code>[^:]+) FORMAT = err_code::$1 WRITE_META = true
SOURCE_KEY = <string>
I'm sure you can get by with these things.