Hi, I have installed splunk w/ s3 add-on. I can add data for s3 bucket, but I can't add data for a s3 bucket/directory.
I will get the error saying no objects found under the directory whereas the directory does contain subdirectories and then files within the subdirectories. How to work around this? Thanks.
In order to pull an entire directory from s3 you must end in a "/"
In S3, this is perfectly legal:
/foo/bar
/foo
This would be illegal in a normal file system. Directories don't actually exist in S3 , they are just an illusion for our benefit. Everything is is just pairs of (key , object ) .
By ending your key in a "/" , you're telling s3 , I want everything that matches key/*
I'm not sure if this will fix your problem ( there could also be a problem in the underlying code), but give it a try.
Thanks
The "answer" above is not valid since the S3 add-on does not seem to traverse into subdirectories correctly unless you add an entire bucket as the target.
i.e. given a bucket "log-bucket" that contains ELB logs you would only be able to monitor the entire bucket with a single input or a single directory/object. When setting up ELB and CloudTrail logging AWS manages the directory structure and organization of those logs in the S3 bucket you specify.
So in "log-bucket" you will have.
/AWSLogs
/AWSLogs/12345678890 (this is your account number)
/AWSLogs/12345678890/elasticloadbalancing
/AWSLogs/12345678890/elasticloadbalancing/us-east-1
Now you get to the actual log directories organized by date:
/AWSLogs/12345678890/elasticloadbalancing/us-east-1/{YEAR}/{Month}/{Day}/something.log
If you were to put CloudTrail logs in this same bucket they would be in the same dir structure.
/AWSLogs/12345678890/CloudTrail/us-east-1/{YEAR}/{Month}/{Day}/something.log
If you have CloudFront and S3 access logs in this same bucket then you would have more issues when monitoring the entire bucket.
Using an input of s3://log-bucket/AWSLogs/12345678890/CloudTrail/
Would give the following error:
Encountered the following error while trying to update: In handler 's3': Invalid configuration specified: No objects found inside s3://log-bucket/AWSLogs/12345678890/CloudTrail/.
In addition to these problems the S3 add-on s3.py script does not appear to handle "paging" of buckets properly. i.e. If there is over 1000 objects in a bucket then the script will only ever see the first 1000 objects because the script does not use markers to page through the results.
See: http://answers.splunk.com/answers/66611/splunk-for-amazon-s3-add-on-not-able-to-fetch-all-logs
is it possible to tell splunk to ignore some sub directories in the s3 input ?
so if i have
/foo/bar/1
/2
/3
it will ignore 3 ?
thanks
In order to pull an entire directory from s3 you must end in a "/"
In S3, this is perfectly legal:
/foo/bar
/foo
This would be illegal in a normal file system. Directories don't actually exist in S3 , they are just an illusion for our benefit. Everything is is just pairs of (key , object ) .
By ending your key in a "/" , you're telling s3 , I want everything that matches key/*
I'm not sure if this will fix your problem ( there could also be a problem in the underlying code), but give it a try.
Thanks
It looks to me like the code is passing the stanza name to s3 as-is.
That would mean you can use any valid s3 key.
To test if you're using a valid s3 key, I really like the aws cli utility . It is available as a pip package. You can install it with pip install awscli.
Then, from the command line you can run:
aws s3 ls s3://
Can the key contain wild card such as *? Thanks.