Build Instructions

The sections below will use $PROJDIR as shorthand to denote a common area for the project source code to be located.

Obtain Dependencies

Follow steps at Install the Yum Repositories required by OSG to obtain OSG yum repo >= version 3.2.

Disable osg-release repo by setting enabled=0 in


Enable osg-testing repo by setting enabled=1 in


Install dependencies:

yum install java-1.7.0-openjdk-devel
yum install pcre-devel
yum install xrootd-client-devel
yum install hadoop-hdfs

NOTE RHEL >= 6.5 may break some OSG java 7 dependencies. If this happens, it can be fixed by running:

yum install osg-java7-compat

Checkout hdfs-xrootd-fallback repo:

svn checkout https://svn.gled.org/var/trunk/hdfs-xrootd-fallback

Make sure JAVA_HOME is set:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64

Build hdfs-xrootd-fallback

cd $PROJDIR/hdfs-xrootd-fallback

This will generate the following files:

  • hdfs-xrootd-fallback-1.0.0.jar
  • libXrdBlockFetcher.so.1.0.0

Deployment for Testing

Assumptions - a hadoop cluster is available and a hadoop client node is properly configured to access it. The following instructions are only required on the client node.

Install Binaries

make install prefix=/usr sysconfdir=/etc

This will install the following:


Also a template config file will be installed in:



Append the following to /etc/hadoop/conf/core-site.xml:


Copy the /etc/hadoop/conf.osg/xfbfs-site.xml template file to /etc/hadoop/conf and modify the relevant values.

Startup Fuse in Debug Mode

hadoop-fuse-dfs -oserver=xfbfs://cabinet-10-10-3.t2.ucsd.edu,port=8020,allow_other,rw,rdbuffer=4096 -d /mnt/hadoop

Modifying Hadoop

The instructions here only need to be followed if you plan to modify the extendable-client.patch.

Obtain Dependencies for HDFS

Install dependencies:

yum --enablerepo=osg-development install maven3
yum --enablerepo=osg-development install protobuf-compiler

Add mvn to path:

export PATH=$PATH:/usr/share/apache-maven-3.0.4/bin

Obtain cloudera hadoop tarball:

wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz
tar -xzf hadoop-2.0.0-cdh4.1.1.tar.gz

Patch hadoop to allow DFSInputStream inheritance:

cd $PROJDIR/hadoop-2.0.0-cdh4.1.1
patch -p1 < $PROJDIR/hdfs-xrootd-fallback/extendable_client.patch

Building HDFS

Currently the only jar file we modify is hadoop-hdfs-2.0.0-cdh4.1.1.jar. To build this, run:

cd $PROJDIR/hadoop-2.0.0-cdh4.1.1/src/hadoop-hdfs-project/hadoop-hdfs
mvn install -DskipTests

The jar will be located in:


To test, this jar should replace:


Creating Updated Patch File

Assuming changes have been made to $PROJDIR/hadoop-2.0.0-cdh4.1.1 and an original unmodded copy was untarred into $PROJDIR/hadoop-2.0.0-cdh4.1.1.orig, run the following to generate a patch file:

$PROJDIR/hdfs-xrootd-fallback/create_hdpatch.sh $PROJDIR/hadoop-2.0.0-cdh4.1.1.orig $PROJDIR/hadoop-2.0.0-cdh4.1.1

The new patch will be created in the current directory and be named extendable_client.patch.

Useful Test Commands

Put File In Hadoop

This example shows how to specify blocksize:

hadoop fs -Ddfs.blocksize=51200 -put xxx_for_jni_test /store/user/matevz

Use Hadoop fsck

To obtain useful info about block locations:

hdfs fsck /store/user/matevz/xxx_for_jni_test -files -blocks -locations

sample output:

Connecting to namenode via http://cabinet-10-10-3.t2.ucsd.edu:50070
FSCK started by jdost (auth:SIMPLE) from / for path /store/user/matevz/xxx_for_jni_test at Tue Jul 30 19:11:13 PDT 2013
/store/user/matevz/xxx_for_jni_test 117499 bytes, 3 block(s):  OK
0. BP-902182059- len=51200 repl=1 []
1. BP-902182059- len=51200 repl=1 []
2. BP-902182059- len=15099 repl=1 []

 Total size:   117499 B
 Total dirs:   0
 Total files:   1
 Total blocks (validated):   3 (avg. block size 39166 B)
 Minimally replicated blocks:   3 (100.0 %)
 Over-replicated blocks:   0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:      0 (0.0 %)
 Default replication factor:   1
 Average block replication:   1.0
 Corrupt blocks:      0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      1
 Number of racks:      1
FSCK ended at Tue Jul 30 19:11:13 PDT 2013 in 18 milliseconds

The filesystem under path '/store/user/matevz/xxx_for_jni_test' is HEALTHY

Corrupt A Block

Run fsck as above to find the node and filename:
1. BP-902182059- len=51200 repl=1 []

The block filename in this example is: blk_-1470524685933485700

Look in /etc/hadoop/conf/hdfs-site.xml to find value of dfs.data.dir

Locate block:

find dfs.data.dir -name blk_-1470524685933485700

Temporarily rename the file with mv along with its associated meta file (ending in .meta) to "corrupt" it.

NOTE sometimes you have to restart the datanode to get hadoop to notice the block is "fixed" after replacing the moved block.

On cabinet-10-10-8.t2.ucsd.edu:

service hadoop-hdfs-datanode restart

-- JeffreyDost - 2013/07/25

