Running Apache Hadoop 2.1.0 on Windows

I have followed following steps to install Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows

  1. Download and install Microsoft Windows SDK v7.1.

  2. Download and install Unix command-line tool Cygwin.

  3. Download and install Maven 3.1.1.

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

  5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already.
    Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system.
    Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start –> All Programs –> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

Install Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

Configure Hadoop

C:\hadoop\etc\hadoop\core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/hadoop/data/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/hadoop/data/dfs/datanode</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\mapred-site.xml

<configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\ yarn-site.xml

<configuration>
        <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

Format namenode

For the first time only, namenode needs to be formatted.

C:\Users\abhijitg>cd c:\hadoop\bin 
c:\hadoop\bin>hdfs namenode –format

Start HDFS (Namenode and Datanode)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-dfs

Start MapReduce aka YARN (Resource Manager and Node Manager)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-yarn
starting yarn daemons

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

Leave a Comment