I have been trying to troubleshoot an issue with aws_cloudtrail.py. Most of the time, the AWS message queue is processed correctly and the number of events that are indexed in splunk matches the number that is reported in aws_cloudtrail.log. However, when there is more than 1 file (*.json.gz) to process in the queue, it appears that only the first file is actually indexed. I have verified (by increasing the logging level and adding my own logging statements to aws_cloudtrail.py) that the events are indeed being read and processed by aws_cloudtrail.py, but for some reason the events are not being indexed by Splunk. I even traced it as far as when the event is converted to XML and written to stdout, everything works just fine. I am completely at a loss for why this is happening. The output log is reporting that all events are being written and they are not showing up as discarded or errors. I can't find any errors pertaining to this on any of my systems.
I even tried sending all of the events to another log file immediately before they would be written to stdout. All of the events are correctly written to the log file, but they do not show up in Splunk.
I am running the TA from a search head which distributes the events to many indexers. When i perform searches over the AWS data, I see data from all indexers.
EDIT: I just set this up on a standalone test-instance, and the same thing is happening. So it doesn't appear to be a problem with "distributed" environments. The app "processed" and claimed to write events from 7 files, only 3 unique files showed up in Splunk, and the event counts for each file exactly matched what was reported by the aws_cloudtrail log.
EDIT 2: This happens with both versions 1.1.0 and 1.1.1
That means we're getting SQS messages that don't line up with the expected CloudTrail format. It could mean that the messages are invalid, or that the expectations are invalid, or that the the script is failing in some way.
It doesn't seem like there is anything wrong with the messages, the script picks them up just fine, processes them, and attempts to send them to Splunk. I added a debugging statements to event_writer.py immediately before the events are written to stdout. I made the events be written to a file as well as being sent to stdout, and ALL of the events show up correctly in the output file, but only events from 1 SQS file per run show up in Splunk. I'm not sure how to continue debugging past this point.