Hadoop Native Library

Hadoop

Having all the tools, we can now build Hadoop native libraries. Assuming you have downloaded the Hadoop 2.2.0 source code, do: $ tar xzvf hadoop-2.2.0-src.tar.gz $ cd hadoop-2.2.0-src $ mvn package -Pdist,native -DskipTests -Dtar. Note: there is a missing dependency in the maven project module that results in a build failure at the hadoop-auth. Sep 11, 2016 It is fairly easy to use the native hadoop library: Review the components. Review the supported platforms. Either download a hadoop release, which will include a pre-built version of the native hadoop library, or build your own version of the native hadoop library. Whether you download or build, the name for the library is the same: libhadoop.so. Native Hadoop Library Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the.nix platforms the library is named libhadoop.so. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features Press Copyright Contact us Creators.

Having just configured Eclipse for Hadoop development, I ran into the following problem:

Running jobs via the hadoop cli command worked fine; this only happened when I tried to run jobs directly from Eclipse, in local mode. After doing a little investigation, I found that the reason behind this is a java property called java.library.path that did not include the correct path.

Hadoop Native Library For Java

Native

Running from the hadoop cli command, the java.library.path property was properly set to /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop/lib/native (I am using the CDH 4.2.0 distribution of Hadoop). When the job was started from inside Eclipse, the java.library.path held it’s system default value:

Hadoop Native Library For Linux

In order to correctly set this property you can configure Eclipse to load the Java Virtual Machine with this setting, or (and this is the better way) add the native library under the respective library from the Java Build Path. In order to do this, first right click on your project and open the Buid Path configuration screen:

Native

In this screen, find the hadoop-common library, expand the row and add the native library by pointing to the correct location:

That’s it, from now on, you’ll be able to run your MapReduce jobs in local mode, and use the proper native libraries!

The following are my notes on how to setup a freshly installed Centos 6.5 machine to compile the Hadoop native library. My reason for doing this was to add a modified version of the native library which utilized hardware compression. In order to accomplish that, I needed to recompile the native library. In a previous post, “Integrating Hardware Accelerated Gzip Compression into Hadoop”, I described the process for modifying the Hadoop native library to use a hardware compression accelerator.

Hadoop Native Library For Dummies

In that previous post I utilized Hadoop 2.2. In this post I’m using Hadoop 2.4 because that’s the version of Hadoop that matches what is utilized by Ambari 1.6.0, the management tool I’ve chosen to administer my current cluster. Additionally, I was using Ubuntu 12.04 in the previous post, whereas in this post I’m using Centos 6.5. Ubuntu is typically my default Linux flavor, but because it is not compatible with Ambari, I installed Centos 6.5.

All of these steps assume you are logged in as “root”. If you aren’t, prefix the commands below with “sudo”.

Hadoop native libraries
  1. Install the development tools required to build the native library via the package manager. (Note: Some of these may already be installed)
  2. Optionally install these packages. (Note: These steps aren’t required to build Hadoop’s native library, I just find myself having to install them eventually)

    • Install the kernel development package, vi, and git using the package manager
    • Install hexedit manually
    • If you have an AHA Hardware Compression Accelerator installed, install/load the driver, and build the hardware accelerated zlib variant. See Steps 1-2 of this post.
  3. Install Oracle Java JDK 1.7
    You might already have this JDK installed, otherwise download Oracle Java JDK 7.1; The file should be named something similar to “jdk-7u71-linux-x64.tar.gz”. On my machine which is also a datanode (installed via Ambari) the JDK was located in /usr/jdk64/jdk1.7.0_67/.

    • Without the JDK already installed do this:
    • With the JDK already installed to /usr/jdk64/jdk1.7.0_67/ do this:
  4. Install Maven 3.0.5
  5. Install Ant 1.9.4
  6. Install Protobuf 2.5.0
  7. Download Hadoop 2.4.0 source
  8. Compile the Hadoop native library. This step takes a while. On my machine it takes around 50 minutes to complete. (Note: Make any changes to the native library, such as adding support for hardware compression, before this step).
  9. Backup Hadoop’s native library, and then replace it with the version compiled in step 8 (Note: The path to the hadoop libraries below are the default location that Ambari installs to. Your installation path will vary depending on how you went about your Hadoop installation)

    That’s it!

Native

Hadoop Native Library Linux

Please enable JavaScript to view the comments powered by Disqus.