All Apps and Add-ons

How to resolve AWS SQS error message="Failed to download file." while using IAM Roles

ionutr
Explorer

I'm trying to set up CloudTrail log ingestion using the AWS splunk addon and using IAM roles.

Details:
Splunk 7.3.3
Splunk Add-on for AWS 4.6.0
Splunk App for AWS 5.1.0

We have an account 123 with key id xyz, set to Region Category Global.

This was configured as an SQS-based S3 input, with no asume role, us-west (oregon)region , with a sqs queue created in AWS
Batch size is 10, S3 file decoder is CloudTrail, sourcetype is aws:cloudtrail, interval 300 sec

This account works just fine and we are getting logs in our index.

We also configured a IAM role for another account 345, that uses a Role ARN of arn:aws:iam:345:role/some_iam_AWS_role
We created another input with the same settings but with an assume role (eg with put a different name for the input, same 123 account with assume role of 345).

This input doesn`t download any logs in the index and we have the following details:

1.Log error extracted form splunk_ta_aws_aws_sqs_based_s3_input-name-345-1.log ($SPLUNK_HOME/var/log/splunk/splunk_ta_aws)

(~ 3 pairs / second ), log files are rotated fast.

Previously, we had:
ClientError: An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied
2019-12-11 13:46:47,101 level=ERROR pid=19928 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_parse:318

This changed after some config tries on the AWS side to the one below:

2019-12-11 19:51:09,232 level=ERROR pid=24254 tid=Thread-9 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_download:292 | start_time=1576088583 datainput="input-name-345-1", message_id="1234567-1234-1234-1234-1234567890abc" created=1576093868.94 job_id=abcdef-abcd-abbd-ssdd-2233aaabbccddd ttl=30 | message="Failed to download file."
2019-12-11 19:51:09,260 level=CRITICAL pid=24254 tid=Thread-9 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:268 | start_time=1576088583 datainput="input-name-345-1", message_id="1234567-1234-1234-1234-1234567890abc" created=1576093868.94 job_id=abcdef-abcd-abbd-ssdd-2233aaabbcc ttl=30 | message="An error occurred while processing the message."
Traceback (most recent call last):
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 256, in _process
headers = self._download(record, cache, session)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 290, in _download
return self._s3_agent.download(record, cache, session)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/modinputs/sqs_based_s3/handler.py", line 418, in download
return bucket.transfer(s3, key, fileobj, **condition)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws/common/s3.py", line 73, in transfer
headers = client.head_object(Bucket=bucket, Key=key, **kwargs)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/3rdparty/botocore/client.py", line 324, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/3rdparty/botocore/client.py", line 622, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

2.looking at the _internal index, i can see logs with message=request role credentials every hour, and the same Failed to download file. messages

3.These assumed role inputs never worked. We get ~ 60k req / hour that fail in splunk.

4.The assume account has the following permissions
"s3:GetAccelerateConfiguration",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetBucketLogging",
"s3:GetBucketTagging",
"s3:GetLifecycleConfiguration",
"s3:GetObject",
"s3:ListAllMyBuckets",
"s3:ListBucket",
"sns:Get*",

Any input is appreciated, as well as any questions for additional data needed.

Labels (1)
Tags (2)
0 Karma

vmorocho
New Member

Hello, I have the same error, Could you tell me if you were able to solve it?

0 Karma

ionutr
Explorer
  • policy:

    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "autoscaling:Describe*",
    "cloudfront:ListDistributions",
    "cloudwatch:Describe*",
    "cloudwatch:Get*",
    "cloudwatch:List*",
    "config:DeliverConfigSnapshot",
    "config:DescribeConfigRuleEvaluationStatus",
    "config:DescribeConfigRules",
    "config:GetComplianceDetailsByConfigRule",
    "config:GetComplianceSummaryByConfigRule",
    "ec2:DescribeAddresses",
    "ec2:DescribeImages",
    "ec2:DescribeInstances",
    "ec2:DescribeKeyPairs",
    "ec2:DescribeNetworkAcls",
    "ec2:DescribeRegions",
    "ec2:DescribeReservedInstances",
    "ec2:DescribeSecurityGroups",
    "ec2:DescribeSnapshots",
    "ec2:DescribeSubnets",
    "ec2:DescribeVolumes",
    "ec2:DescribeVpcs",
    "elasticloadbalancing:DescribeInstanceHealth",
    "elasticloadbalancing:DescribeListeners",
    "elasticloadbalancing:DescribeLoadBalancers",
    "elasticloadbalancing:DescribeTags",
    "elasticloadbalancing:DescribeTargetGroups",
    "elasticloadbalancing:DescribeTargetHealth",
    "iam:GetAccessKeyLastUsed",
    "iam:GetAccountPasswordPolicy",
    "iam:GetUser",
    "iam:ListAccessKeys",
    "iam:ListUsers",
    "inspector:Describe*",
    "inspector:List*",
    "kinesis:DescribeStream",
    "kinesis:Get*",
    "kinesis:ListStreams",
    "kms:Decrypt",
    "lambda:ListFunctions",
    "logs:DescribeLogGroups",
    "logs:DescribeLogStreams",
    "logs:GetLogEvents",
    "rds:DescribeDBInstances",
    "s3:GetAccelerateConfiguration",
    "s3:GetBucketCORS",
    "s3:GetBucketLocation",
    "s3:GetBucketLogging",
    "s3:GetBucketTagging",
    "s3:GetLifecycleConfiguration",
    "s3:GetObject",
    "s3:ListAllMyBuckets",
    "s3:ListBucket",
    "sns:Get*",
    "sns:List*",
    "sns:Publish",
    "sqs:DeleteMessage",
    "sqs:GetQueueAttributes",
    "sqs:GetQueueUrl",
    "sqs:ListQueues",
    "sqs:ReceiveMessage",
    "sqs:SendMessage",
    "sts:AssumeRole"
    ],
    "Resource": [
    "*"
    ]
    }
    ]
    }

0 Karma

youngsuh
Contributor

Did you fix the issue?  Would like to know what you did other than the blog posted?

0 Karma

ionutr
Explorer
 
0 Karma

yshabano
Explorer

I am running into this issue as well, one recommended solution is using Lambda per my research saying its a limitation of the SNS queues.  Referencing the following article:

https://www.splunk.com/en_us/blog/tips-and-tricks/making-the-collection-of-centralised-s3-logs-into-...

 

 

0 Karma

adnankhan5133
Communicator

Were you able to resolve this? If so, what approach did you take?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...