Problem fetching logs from AWS S3 Buckets

msenebald · ‎01-08-2015

Hi,

i try to setup Splunk Add-on for Amazon Web Services (v.1.0.1 on 6.2.0) with little success.

it seems that my connection setup access key + secret key are working.
From my understanding of the documentation http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureAWS
all access rights are setup properly. Still i get a lot of errors in splunk.

S3
this seem to be a major thing, when i try to setup S3 input. I can select aws account, select bucket and i get this :

 In handler 'splunk_ta_aws_s3key': Unexpected error "<class 'boto.exception.S3ResponseError'>" from python handler: "S3ResponseError: 400 Bad Request ". See splunkd.log for more details.

splunkd.log throws:

01-08-2015 20:21:04.055 +0100 ERROR AdminManagerExternal - Unexpected error "<class 'boto.exception.S3ResponseError'>" from python handler: "S3ResponseError: 400 Bad Request\n".  See splunkd.log for more details.
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/splunkd.log sourcetype = splunkd
08.01.15 20:21:04,055   
01-08-2015 20:21:04.055 +0100 ERROR AdminManagerExternal - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/lib/python2.7/site-packages/splunk/admin.py", line 70, in init\n    hand.execute(info)\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/lib/python2.7/site-packages/splunk/admin.py", line 527, in execute\n    if self.requestedAction == ACTION_LIST:     self.handleList(confInfo)\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws_s3key_handler.py", line 28, in wrapper\n    result = func(*args, **kwargs)\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/etc/apps/Splunk_TA_aws/bin/splunk_ta_aws_s3key_handler.py", line 53, in handleList\n    bucket = connection.get_bucket(self.callerArgs['bucket_name'][0])\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/etc/apps/Splunk_TA_aws/bin/boto/s3/connection.py", line 502, in get_bucket\n    return self.head_bucket(bucket_name, headers=headers)\n  File "/opt/splunk/splunk-6.2.1-245427-Linux-x86_64/etc/apps/Splunk_TA_aws/bin/boto/s3/connection.py", line 549, in head_bucket\n    response.status, response.reason, body)\nS3ResponseError: S3ResponseError: 400 Bad Request\n\n

Cloudtrail
I can setup everything, select aws account, region, select the sqs queue and so forth. but don't get any data in.
in aws_cloudtrail.log i see:

08.01.15 21:19:27,955   
2015-01-08 21:19:27,955 INFO pid=27777 tid=MainThread file=aws_cloudtrail.py:<module>:419 | EXITED: 1
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/aws_cloudtrail.log sourcetype = aws_cloudwatch-2
08.01.15 21:19:27,954   
2015-01-08 21:19:27,954 CRITICAL pid=27777 tid=MainThread file=aws_cloudtrail.py:stream_events:286 | Outer catchall: ParseError: no element found: line 1, column 0
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/aws_cloudtrail.log sourcetype = aws_cloudwatch-2
08.01.15 21:19:27,491   
2015-01-08 21:19:27,491 DEBUG pid=27777 tid=MainThread file=aws_cloudtrail.py:stream_events:210 | Connect to S3 & Sqs sucessfully
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/aws_cloudtrail.log sourcetype = aws_cloudwatch-2
08.01.15 21:19:27,448   
2015-01-08 21:19:27,448 INFO pid=27777 tid=MainThread file=aws_cloudtrail.py:get_access_key_pwd_real:109 | get account name: test
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/aws_cloudtrail.log sourcetype = aws_cloudwatch-2
08.01.15 21:19:27,448   
2015-01-08 21:19:27,448 DEBUG pid=27777 tid=MainThread file=aws_cloudtrail.py:stream_events:196 | blacklist regex for eventNames is ^(?:Describe|List|Get)
host = splunkvm03 index = _internal source = /opt/splunk/splunk-6.2.1-245427-Linux-x86_64/var/log/splunk/aws_cloudtrail.log sourcetype = aws_cloudwatch-2
08.01.15 21:19:27,448   
2015-01-08 21:19:27,448 DEBUG pid=27777 tid=MainThread file=aws_cloudtrail.py:stream_events:178 | Start streaming.

Billing
Here i can select aws account, bucket and when i try to save i get the following error thrown.

In handler 'aws_billing': Failed AWS Validation: S3ResponseError: 400 Bad Request (None):

To me it looks like the s3 does not really work, but i have no idea why. trying to setup up everything manually via inputs.conf didn't bring any success, the errors in splunkd.log seems to be the same.

Does some have an idea? did i miss something curial? The Python Errors and AWS Validation messages doesn't make any sense to me.

Thanks in advance

awurster · ‎10-27-2015

@jcoates i think there are actually multiple things going on here. we found the following issue migrating from a homegrown aggregator with a single queue, to a queue which serves up notifications from tons of s3buckets.

i can confirm this was an issue in both 1.1.1 and 2.0 versions of the TA.

records would download fine in most cases, but then we'd see a stacktrace before any events were ingested. we think it was possibly treating the directory / folder itself as a cloudtrail log, and just giving up.

2015-10-28 01:59:04,521 INFO pid=10768 tid=MainThread file=aws_cloudtrail.py:process_S3_notifications:453 | fetched 31 records, wrote 31, discarded 0, redirected 0 from  s3:foo-cloudtrails/nnnnnnnn/folder/yyyyyyyyyy/CloudTrail/region/2015/10/28/aaaaaaaaaaaa-foo.json.gz
2015-10-28 01:59:35,704 CRITICAL pid=10798 tid=MainThread file=aws_cloudtrail.py:stream_events:331 | Outer catchall - Traceback:
Traceback (most recent call last):
  File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/aws_cloudtrail.py", line 269, in stream_events
    s3_completed, s3_keys_to_delete, s3_failed=self.process_S3_notifications(s3_conn, s3_notifications)
  File "/opt/splunk/etc/apps/Splunk_TA_aws/bin/aws_cloudtrail.py", line 405, in process_S3_notifications
    message['s3Bucket'], key, type(e).__name__, e))
KeyError: 's3Bucket'

digging further with a cleaned up debug message... (which is super tricky in splunk TBH) seems like it was a formatting error actually printing the error itself and killing the script 😕 :

2015-10-28 05:16:14,005 ERROR pid=26577 tid=MainThread file=aws_cloudtrail.py:process_S3_notifications:407 | problems reading json from s3:foo-cloudtrails/nnnnnnnn/folder/yyyyyyyyyy/: ValueError No JSON object could be decoded

so effectively, we DoS'd ourselves until we patched the script. the main line for us was below, but other exceptions around there would also fail similarly:

                except ValueError as e:
                    message_failed=True
                    logger.log(logging.ERROR, "problems reading json from s3:{}/{}: {} {}".format(
                        message['s3']['bucket']['name'], key, type(e).__name__, e))
                    #logger.log(logging.ERROR, "problems reading json from s3:{}/{}: {} {}".format(
                    #    message['s3Bucket'], key, type(e).__name__, e))

is there a github or bitbucket repo which we can use to suggest changes? for now, i will toss them in bitbucket:
https://bitbucket.org/awurster/splunk-ta-aws/commits/311f3828e422bf1583148039771a7c051e3cc6c0

jcoates_splunk · ‎01-08-2015

Hi,

these look like permissions issues, though the cloudtrail one could also be caused by having multiple modular inputs trying to read the same queue and bucket.

We are working on improving the logging to make it clearer what goes wrong when something goes wrong.

kkossery · ‎02-12-2015

I don't want to hijack this thread but just want to make sure Splunk is aware of this.
I have a similar issue where I get the error,

 DEBUG pid=11513 tid=MainThread file=aws_cloudtrail.py:stream_events:210 | Connect to S3 & Sqs sucessfully
2015-02-12 19:23:56,799 CRITICAL pid=11513 tid=MainThread file=aws_cloudtrail.py:stream_events:286 | Outer catchall: TypeError: 'int'        object has no attribute '__getitem__'
2015-02-12 19:23:56,799 INFO pid=11513 tid=MainThread file=aws_cloudtrail.py:<module>:419 | EXITED: 1

It looks like Splunk is getting the logs to its indexes from AWS but the AWS add on is unable to parse the data and generate meaningful reports spewing out the above errors.
Can you guys help? I'm using the latest version of Splunk on Amazon Linux if that helps.

jcoates_splunk · ‎02-12-2015

every time i've seen that sort of error message it's meant that the add-on is being directed to gather "cloudtrail" data from a bucket that actually contains something else.

jcoates_splunk · ‎05-09-2015

Another possibility is CloudTrail data where multiple accounts are being aggregated into single messages; this is now addressed in Add-on for AWS 1.1.1.

kkossery · ‎05-11-2015

Re-creating the setup resolved this issue for me. I'm guessing there was a mis-step some where while doing the configuration.

Thanks!

msenebald · ‎01-08-2015

Hi,

thanks for that answer. Will this be in the beta of 1.1.0 ?

I tried to get one input at a time to work. So it can't be things trying to request the same queue or same bucket.
At the moment nothing is fetching anything. My focus is s3 at the moment.
Two things come to my mind.

it is indeed a policy problem, then my question would be how would i find this out? I already tried using an root/admin accesskey that is allowed to do everything and that has created the bucket (so all rights for the owner) no progress, same failures. Something not related to S3 directly? Account Setting?
It is a problem with the aws api/sdk i found some problems/similar errors on the net related to boto und s3. Also for the sqs/sns service i try so setup most things against eu-central-1 which is quite new and in a different situation a newer sdk version solved problems there. (not sure if this applies for S3 anyway.. since it is not tied to a region)

Is there a way for me to create a hook in the python scripts that can give me a more verbose output? to see what actually fails? and to determine if it is a permission or script problem.

Thanks

jcoates_splunk · ‎02-16-2015

1.1.0 final was posted Friday with improved logging, see index=_internal source=*aws*

Problem fetching logs from AWS S3 Buckets

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

Problem fetching logs from AWS S3 Buckets

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey