Getting Data In

Ingesting and 'Transforming' AWS SQS Messages

y2000maxima
Engager

Hello,

      We are trying to ingest JSON based messages from an AWS SQS topic.    When ingesting the messages we are finding extra added json around the actual Message we are trying to ingest.  The extra JSON is automatically added in by AWS SQS.  The actual Message we want to ingest has the xpath of  "?BodyJson?Message".    Can we configure the Splunk TA to pull the SQS Messages off the topic but apply some type of xpath or transform to only ingest the Message (?BodyJson?Message).     See screenshot below.  While pulling the message off the SQS topic we only want the message in the green rectangle.   but its buried in all the other json....

2023-04-07_12-20-57.jpg

Actual JSON to whole message above in screenshot.

{
"MessageId": 23411111111444,
"ReceiptHandle": "y",
"MD5OfBody": 23411333333333111111444,
"Body": "{\n \"Type\" : \"Notification\",\n \"MessageId\" : \"xxxxxxx-xxx-xxxxxx\",\n \"TopicArn\" : \"arn:topic123\",\n \"Message\" : \"{\\\"timestamp\\\": \\\"1680882420000\\\", \\\"metric_name:test\\\": \\\"0\\\", \\\"aggregation\\\": \\\"avg\\\", \\\"resolution\\\": \\\"1m\\\", \\\"unit\\\": \\\"Percent\\\", \\\"entity.id\\\": \\\"SERVICE-12345\\\", \\\"entity.name\\\": \\\"test\\\", \\\"source.name\\\": \\\"testsource\\\"}\",\n \"Timestamp\" : \"2023-04-07T15:56:02.509Z\",\n \"SignatureVersion\" : \"1\",\n \"Signature\" : \"23423423423\",\n \"SigningCertURL\" : \"https://sns.u234234234234234234\",\n \"UnsubscribeURL\" : \"https://sns.23423423423423423423\"\n}",
"Attributes": {
"SenderId": "xxxxxxxxxxxxxxx",
"ApproximateFirstReceiveTimestamp": "1680882978026",
"ApproximateReceiveCount": "1",
"SentTimestamp": "1680882962536"
},
"BodyJson": {
"Type": "Notification",
"MessageId": "xxxxxxxxxxxxxxxxx",
"TopicArn": "arn:aws:sns:us-east-1:996142040734:APP-4498-dev-PerfEngDynatraceAPIClient-DynatraceMetricsSNSTopic-qFolXGcy2Ufh",
"Message": "{\"timestamp\": \"1680882420000\", \"metric_name:test\": \"0\", \"aggregation\": \"avg\", \"resolution\": \"1m\", \"unit\": \"Percent\", \"entity.id\": \"SERVICE-12345\", \"entity.name\": \"test\", \"source.name\": \"testsource\"}",
"Timestamp": "2023-04-07T15:56:02.509Z",
"SignatureVersion": "1",
"Signature": 23423423423,
"SigningCertURL": "https://sns.u234234234234234234",
"UnsubscribeURL": "https://sns.23423423423423423423"
}
}
Labels (1)
0 Karma

y2000maxima
Engager

Thank you very much tscroggins.  We will try your suggestion out.   Hopefully being on Splunk SaaS won't prevent our ability to do this. 

Thank you

0 Karma

tscroggins
Influencer

You'll likely need to contact Splunk support to implement INGEST_EVAL. If the schema and field order of the outer and inner JSON never change, you can also use a combination of SEDCMD and a regular transform:

# props.conf

[aws:sqs]
MAX_TIMESTAMP_LOOKAHEAD = 13
SEDCMD-unescape = s/\\//g
TIME_FORMAT = %s%3Q
TIME_PREFIX = "Message"\s*:\s*"\{\\"timestamp\\"\s*:\s*\\"
TRANSFORMS-copy_message_to_raw = copy_message_to_raw

# transforms.conf

[copy_message_to_raw]
DEST_KEY = _raw
FORMAT = $1
REGEX = "Message"\s*:\s*"([^}]+\})

Everything above can be implemented through the user interface.

Note that the SEDCMD regular expression will aggressively remove all backslashes. In my test environment (Splunk Enterprise 9.0.4.1), typical solutions for stripping backslashes end up adding backslashes. E.g.:

\" => s/\x5C"/"/g => \\"

Splunk's treatment of backslashes in SEDCMD and SPL regular expression commands has always been finicky. Strict adherence to C-style escape sequences in SPL strings and no special handling in SEDCMD would be preferred, but I think they're doing their best to balance the user experience.

0 Karma

y2000maxima
Engager

Thank you again! 

0 Karma

tscroggins
Influencer

Hi,

You can do this pretty easily with an INGEST_EVAL transform:

# props.conf

[aws:sqs]
TRANSFORMS-copy_bodyjson_message_to_raw = copy_bodyjson_message_to_raw

# transforms.conf

[copy_bodyjson_message_to_raw]
INGEST_EVAL = _raw=json_extract(_raw, "BodyJson.Message"), _time=strptime(json_extract(_raw, "timestamp"), "%s%3Q")

The example transform also extracts the timestamp from the inner JSON message.

If you have other events with sourcetype = aws:sqs, you can use a source stanza in props.conf instead of a source type stanza and reference the SQS input by name:

[source::<your_sqs_source_name>]
TRANSFORMS-copy_bodyjson_message_to_raw = copy_bodyjson_message_to_raw

If you need to retain the original event, you can clone the event into a new source type and modify _raw on the cloned event:

# props.conf

[source::<your_sqs_source_name>]
TRANSFORMS-clone_service_metric = clone_service_metric

[aws:sqs:service_metric]
TRANSFORMS-copy_bodyjson_message_to_raw = copy_bodyjson_message_to_raw

# transforms.conf

[clone_service_metric]
REGEX = .
CLONE_SOURCETYPE = aws:sqs:service_metric

[copy_bodyjson_message_to_raw]
INGEST_EVAL = _raw=json_extract(_raw, "BodyJson.Message"), _time=strptime(json_extract(_raw, "timestamp"), "%s%3Q")

 

Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...