HDFS Xrootd Fallback Administration
Finding Files with a Given Replication
The examples here use the
hadoop_lsr
tool, which can be found on uaf at:
~jdost/bin/hadoop_lsr
List files of a given replication (2 in this example):
hadoop_lsr 2 /some/dir/in/hadoop
Find all files of replication 2 recursively in all subdirectories:
hadoop_lsr 2 -R /some/dir/in/hadoop
NOTE the -R option can cause a load increase on the namenode if the file namespace is large enough, use with care!
Changing File Replication
These examples only require built-in hadoop commands.
NOTE you need to either be the
cmswriter
user to modify replication or have
root
access on a node with a hadoop client.
Set replication to 1 for a single file:
hadoop fs -setrep 1 /some/file/in/hadoop
Set replication to 1 recursively for all files in a directory and all its subdirectories:
hadoop fs -setrep -R 1 /some/dir/in/hadoop
NOTE the -R option can cause a load increase on the namenode if the file namespace is large enough, use with care!
Using a bash while loop to set replication to 1 over a list of directories in a file called
rep1.txt
:
while read line;do echo `date +%T` $line;hadoop fs -setrep -R 1 $line > /dev/null;done < rep1.txt
Namespaces with Replication Set to 1
As of 2014-03-27 we have Replication reduced to 1 for the following:
- /cms/phedex/store/data/Run2012D/
2014-08-04:
- /cms/phedex/store/himc
- /cms/phedex/store/data/Run2012A
- /cms/phedex/store/data/Run2012B
- /cms/phedex/store/data/Run2012C
- /cms/phedex/store/data/Fall13
- /cms/phedex/store/data/Summer13
- /cms/phedex/store/relval
- /cms/phedex/store/mc/Summer12_DR53X
2015-06-30:
- /cms/phedex/store/mc/Summer11Backfill
- /cms/phedex/store/mc/Summer11
- /cms/phedex/store/mc/Summer12FS53
- /cms/phedex/store/mc/Summer12_DR53X
- /cms/phedex/store/mc/Summer12Backfill
- /cms/phedex/store/mc/Summer12pLHE
- /cms/phedex/store/mc/Summer11LegwmLHE
- /cms/phedex/store/mc/Summer11Leg
- /cms/phedex/store/mc/Summer11LegDR
- /cms/phedex/store/mc/Summer12WMLHE
- /cms/phedex/store/mc/Summer12 (waiting for FKW's go-ahead for this one since it also has a subdir in our x3 namespace)
- /cms/phedex/store/mc/Summer12DR53X
Namespaces with Replication Set to 3
CMS started running some digi-reco 2015-05-28 which requires xrootd transfers from UCSDT2 to SDSC. To ensure throughput and file availability we upped replication to the following namespace:
- /cms/phedex/store/mc/RunIIWinter15GS/MinBias_TuneCUETP8M1_13TeV-pythia8
- /cms/phedex/store/mc/RunIIFall14GS/MinBias_TuneCUETP8M1_13TeV-pythia8
- /cms/phedex/store/mc/Summer12/MinBias_TuneZ2star_8TeV-pythia6
Healing progress
As of 2014-09-09, performed healing on a small subset:
- /cms/phedex/store/data/Run2012A/DoubleElectron/AOD/22Jan2013-v1/20000
This should have healed 33 of the 95 corrupt files as of writing this. Will confirm on 09-10
Monitoring Fallback
Currently the best place to check to ensure fallback is working as expected is the udp log at
xrootd-proxy.t2.ucsd.edu
:
/var/log/xrootd/hdfs-mon-snatcher.log
This section will be updated once more monitoring tools are developed.
--
JeffreyDost - 2014/03/27