All Apps and Add-ons

Why do the CloudWatch logs contain XML tags?

Engager

We have logs from all of our production servers being pushed to CloudWatch, and we're evaluating Splunk as a better way to search those logs.

We were able to get the AWS add-on set-up with an account on AWS that is able to access CloudWatch logs without issue, and we are getting the data we want, for the most part. The only problem is that it seems like the XML coming back from CloudWatch is indexed raw, so it looks like this (for example):

Event #1:

<event><time>1448816732.44</time><source>us-east-1:production:SERVER1</source><sourcetype>aws:cloudwatchlogs</sourcetype><index>default</index><data><![CDATA[First line of log output

Events #2-1000 (in this example) contain each line of log output. This output is indexed exactly the way we want.

Event #3:

]]></data></event><event><time>1448816734.44</time><source>us-east-1:production:SERVER2</source><sourcetype>aws:cloudwatchlogs</sourcetype><index>default</index><data><![CDATA[First line of next log event
></data>

So, basically, it looks like there's CloudWatch wrapper XML around several lines of log output, and what's happening is that the first lines of log output are mixed-in with the closing tags of the last CloudWatch "event" and the opening tags of the new CloudWatch event. Since we have multiple servers, this sometimes leads to events from different servers being classified under the wrong source type.

Is there something we can adjust to fix this, or this a bug / limitation in the add-on for CloudWatch?

0 Karma

Splunk Employee
Splunk Employee

are you using the Cloudwatch Logs input, or just an S3 input? You should be seeing a format like this:

sourcetype=aws:cloudwatchlogs:vpcflow

2 000000000000 eni-00000000 43.97.128.12 83.165.209.134 41151 59573 75 2 9730 1450112760 1450112760 ACCEPT OK
0 Karma