You can’t mix bits of Hadoop and expect things to work. It’s not just the close coupling between internal classes in hadoop-common and hadoop-aws, its things like the specific version of the amazon-aws SDK the hadoop-aws module was built it.
If you get ClassNotFoundException
or MethodNotFoundException
stack traces when trying to work with s3a://
URLs, JAR version mismatch is the likely cause.
Using the RFC2117 MUST/SHOULD/MAY terminology, here are the rules to avoid this situation:
- The s3a connector is in hadoop-aws JAR; it depends on hadoop-common and the aws-sdk-shaded JARs.
- all these JARs MUST be on the classpath.
- All versions of the hadoop-* JARs on your classpath MUST be exactly the same version, e.g 3.3.1 everywhere, or 3.2.2. Otherwise: stack trace. Always
- And they MUST be exclusively of that version; there MUST NOT be multiple versions of hadoop-common, hadoop-aws etc on the classpath. Otherwise: stack trace. Always. Usually
ClassNotFoundException
indicating a mismatch in hadoop-common and hadoop-aws. - The exact missing class varies across Hadoop releases: it’s the first class depended on by
org.apache.fs.s3a.S3AFileSystem
which the classloader can’t find -the exact class depends on the mismatch of JARs - The AWS SDK version SHOULD be the one shipped. Otherwise: maybe stack trace, maybe not. Either way -you are in self-support mode or have opted to join a QE team for version testing.
- The specific version of the AWS SDK you need can be determined from Maven Repository
- Changing the AWS SDK versions MAY work. You get to test, and if there are compatibility problems: you get to fix. See Qualifying an AWS SDK Update for the least you should be doing.
- You SHOULD use the most recent versions of Hadoop you can/Spark is tested with. Non-critical bug fixes do not get backported to old Hadoop releases, and the S3A and ABFS connectors are rapidly evolving. New releases will be better, stronger, faster. Generally
- If none of this works. a bug report filed on the ASF JIRA server will get closed as WORKSFORME. Config issues aren’t treated as code bugs
Finally: the ASF documentationThe S3A Connector