I installed Hunk and Hadoop 2.2.0 on my Hunk node and launched an EMR cluster with Hadoop 2.2.0. In indexes.conf, I set vix.fs.default.name to my s3 bucket. This results in the following error message:
cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
To work around this, I configured core-site.xml to define fs.s3.impl, and added a bunch of jars from AWS (e.g. emr-fs-1.0.0.jar.) I started getting some classpath errors and fixed them in hadoop-env.sh. However, I am running back into the same error message:
cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3n
What else do I need to do?
Added fs.s3n.* and fs.s3.* in core-site.xml to provide AWS S3 credentials. I am able to use hadoop fs -ls
to get a listing of my bucket.
Are you able to use the Hadoop CLI to access the s3 filesystem (hadoop fs -ls s3://... )?
Not sure how I messed up the formatting. It should say "$> hadoop fs -ls s3n://xxx/"
Are you able to use the Hadoop CLI to access the s3 filesystem (hadoop fs -ls s3://... )?
Can you please provide us the full stacktrace so we can see during which step is Hunk failing?
I also tried setting vix.fs.s3n.impl = com.amazon.ws.emr.hadoop.fs.EmrFileSystem and vix.fs.s3.impl = com.amazon.ws.emr.hadoop.fs.EmrFileSystem in the provider.
Fixed that, but I am still getting the same error.
Great idea. Unfortunately, I cannot - ls: 's3://xxx': No such file or directory. This tells me that the problem is in my hadoop config, not my hunk config.