How to resolve AWS SQS error message="Failed to do...

ionutr · ‎12-11-2019

I'm trying to set up CloudTrail log ingestion using the AWS splunk addon and using IAM roles.

Details:
Splunk 7.3.3
Splunk Add-on for AWS 4.6.0
Splunk App for AWS 5.1.0

We have an account 123 with key id xyz, set to Region Category Global.

This was configured as an SQS-based S3 input, with no asume role, us-west (oregon)region , with a sqs queue created in AWS
Batch size is 10, S3 file decoder is CloudTrail, sourcetype is aws:cloudtrail, interval 300 sec

This account works just fine and we are getting logs in our index.

We also configured a IAM role for another account 345, that uses a Role ARN of arn:aws:iam:345:role/some_iam_AWS_role
We created another input with the same settings but with an assume role (eg with put a different name for the input, same 123 account with assume role of 345).

This input doesn`t download any logs in the index and we have the following details:

1.Log error extracted form splunk_ta_aws_aws_sqs_based_s3_input-name-345-1.log ($SPLUNK_HOME/var/log/splunk/splunk_ta_aws)

(~ 3 pairs / second ), log files are rotated fast.

Previously, we had:
ClientError: An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied
2019-12-11 13:46:47,101 level=ERROR pid=19928 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_parse:318

This changed after some config tries on the AWS side to the one below:

2019-12-11 19:51:09,232 level=ERROR pid=24254 tid=Thread-9 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_download:292 | start_time=1576088583 datainput="input-name-345-1", message_id="1234567-1234-1234-1234-1234567890abc" created=1576093868.94 job_id=abcdef-abcd-abbd-ssdd-2233aaabbccddd ttl=30 | message="Failed to download file."
2019-12-11 19:51:09,260 level=CRITICAL pid=24254 tid=Thread-9 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:268 | start_time=1576088583 datainput="input-name-345-1", message_id="1234567-1234-1234-1234-1234567890abc" created=1576093868.94 job_id=abcdef-abcd-abbd-ssdd-2233aaabbcc ttl=30 | message="An error occurred while processing the message."
Traceback (most recent call last):
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 256, in _process
headers = self._download(record, cache, session)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 290, in _download
return self._s3_agent.download(record, cache, session)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 418, in download
return bucket.transfer(s3, key, fileobj, **condition)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/common/s3.py", line 73, in transfer
headers = client.head_object(Bucket=bucket, Key=key, **kwargs)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/3rdparty/botocore/client.py", line 324, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/3rdparty/botocore/client.py", line 622, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

2.looking at the _internal index, i can see logs with message=request role credentials every hour, and the same Failed to download file. messages

3.These assumed role inputs never worked. We get ~ 60k req / hour that fail in splunk.

4.The assume account has the following permissions
"s3:GetAccelerateConfiguration",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetBucketLogging",
"s3:GetBucketTagging",
"s3:GetLifecycleConfiguration",
"s3:GetObject",
"s3:ListAllMyBuckets",
"s3:ListBucket",
"sns:Get*",

Any input is appreciated, as well as any questions for additional data needed.

vmorocho · ‎06-02-2023

Hello, I have the same error, Could you tell me if you were able to solve it?

ionutr · ‎12-11-2019

policy:

"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:Describe*",
"cloudfront:ListDistributions",
"cloudwatch:Describe*",
"cloudwatch:Get*",
"cloudwatch:List*",
"config:DeliverConfigSnapshot",
"config:DescribeConfigRuleEvaluationStatus",
"config:DescribeConfigRules",
"config:GetComplianceDetailsByConfigRule",
"config:GetComplianceSummaryByConfigRule",
"ec2:DescribeAddresses",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeKeyPairs",
"ec2:DescribeNetworkAcls",
"ec2:DescribeRegions",
"ec2:DescribeReservedInstances",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSnapshots",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"elasticloadbalancing:DescribeInstanceHealth",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetHealth",
"iam:GetAccessKeyLastUsed",
"iam:GetAccountPasswordPolicy",
"iam:GetUser",
"iam:ListAccessKeys",
"iam:ListUsers",
"inspector:Describe*",
"inspector:List*",
"kinesis:DescribeStream",
"kinesis:Get*",
"kinesis:ListStreams",
"kms:Decrypt",
"lambda:ListFunctions",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"rds:DescribeDBInstances",
"s3:GetAccelerateConfiguration",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetBucketLogging",
"s3:GetBucketTagging",
"s3:GetLifecycleConfiguration",
"s3:GetObject",
"s3:ListAllMyBuckets",
"s3:ListBucket",
"sns:Get*",
"sns:List*",
"sns:Publish",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:ListQueues",
"sqs:ReceiveMessage",
"sqs:SendMessage",
"sts:AssumeRole"
],
"Resource": [
"*"
]
}
]
}

youngsuh · ‎10-14-2022

Did you fix the issue? Would like to know what you did other than the blog posted?

ionutr · ‎11-12-2020

yshabano · ‎03-11-2021

I am running into this issue as well, one recommended solution is using Lambda per my research saying its a limitation of the SNS queues. Referencing the following article:

https://www.splunk.com/en_us/blog/tips-and-tricks/making-the-collection-of-centralised-s3-logs-into-...

adnankhan5133 · ‎10-17-2023

Were you able to resolve this? If so, what approach did you take?

How to resolve AWS SQS error message="Failed to download file." while using IAM Roles

troubleshooting

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes