How to fix corrupt HDFS FIles

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really
verbose especially on a large HDFS filesystem so I normally get down to
the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is
larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the
datanodes and the namenode logs searching for the machine or machines
on which the blocks lived. Try looking for filesystem errors
on those machines. Missing mount points, datanode not running,
file system reformatted/reprovisioned. If you can find a problem
in that way and bring the block back online that file will be healthy
again.

Lather rinse and repeat until all files are healthy or you exhaust
all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks,
just use the

  hdfs fs -rm /path/to/file/with/permanently/missing/blocks

command to get your HDFS filesystem back to healthy so you can start
tracking new errors as they occur.

Leave a Comment