java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics

You can’t mix bits of Hadoop and expect things to work. It’s not just the close coupling between internal classes in hadoop-common and hadoop-aws, its things like the specific version of the amazon-aws SDK the hadoop-aws module was built it.

If you get ClassNotFoundException or MethodNotFoundException stack traces when trying to work with s3a:// URLs, JAR version mismatch is the likely cause.

Using the RFC2117 MUST/SHOULD/MAY terminology, here are the rules to avoid this situation:

  1. The s3a connector is in hadoop-aws JAR; it depends on hadoop-common and the aws-sdk-shaded JARs.
  2. all these JARs MUST be on the classpath.
  3. All versions of the hadoop-* JARs on your classpath MUST be exactly the same version, e.g 3.3.1 everywhere, or 3.2.2. Otherwise: stack trace. Always
  4. And they MUST be exclusively of that version; there MUST NOT be multiple versions of hadoop-common, hadoop-aws etc on the classpath. Otherwise: stack trace. Always. Usually ClassNotFoundException indicating a mismatch in hadoop-common and hadoop-aws.
  5. The exact missing class varies across Hadoop releases: it’s the first class depended on by org.apache.fs.s3a.S3AFileSystem which the classloader can’t find -the exact class depends on the mismatch of JARs
  6. The AWS SDK version SHOULD be the one shipped. Otherwise: maybe stack trace, maybe not. Either way -you are in self-support mode or have opted to join a QE team for version testing.
  7. The specific version of the AWS SDK you need can be determined from Maven Repository
  8. Changing the AWS SDK versions MAY work. You get to test, and if there are compatibility problems: you get to fix. See Qualifying an AWS SDK Update for the least you should be doing.
  9. You SHOULD use the most recent versions of Hadoop you can/Spark is tested with. Non-critical bug fixes do not get backported to old Hadoop releases, and the S3A and ABFS connectors are rapidly evolving. New releases will be better, stronger, faster. Generally
  10. If none of this works. a bug report filed on the ASF JIRA server will get closed as WORKSFORME. Config issues aren’t treated as code bugs

Finally: the ASF documentationThe S3A Connector

Leave a Comment