We have configured large number of CloudWatch log groups as a separate input in our heavy forwarder. We have noticed that when pulling the logs from AWS instance, we are getting throttling exceptions for few of the log groups as mentioned below.
2016-10-25 10:33:18,164 ERROR pid=24573 tid=Thread-12 file=aws_cloudwatch_logs_data_loader.py:describe_cloudwatch_log_streams:74 | Failure in describing cloudwatch logs streams due to throttling exception for log_group=/okapi2/nprod/var/log/custom/vmstat, sleep=3.65761418489, reason=Traceback (most recent call last):
File "/opt/app/splunk/etc/apps/Splunk_TA_aws/bin/cloudwatch_logs_mod/aws_cloudwatch_logs_data_loader.py", line 64, in describe_cloudwatch_log_streams
group_name, next_token=buf["nextToken"])
File "/opt/app/splunk/etc/apps/Splunk_TA_aws/bin/boto/logs/layer1.py", line 308, in describe_log_streams
body=json.dumps(params))
File "/opt/app/splunk/etc/apps/Splunk_TA_aws/bin/boto/logs/layer1.py", line 576, in make_request
body=json_body)
JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Rate exceeded', u'__type': u'ThrottlingException'}
The error message is from AWS which Splunk don't have any controls over , you can try to contact AWS and ask to increase API rate increase. Otherwise please switch to Kinesis input to work it around;
It says the reason for the error in the message:
Throttling Exception, Rate Exceeded.
So you've gone over your limit for downloading cloud watch logs. Yes AWS support can help you but you can probably find the solution quicker with a quick google search "increase cloud watch download limits"
In fact it was on the first link returned when I googled that: http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_limits.html
I think you need to request a rate increase on your "list metrics" or "describe alarms" limit. The lowest one is describe alarms so you might want to start there.
The error message is from AWS which Splunk don't have any controls over , you can try to contact AWS and ask to increase API rate increase. Otherwise please switch to Kinesis input to work it around;