HDFS Xrootd Fallback Administration

Finding Files with a Given Replication

The examples here use the hadoop_lsr tool, which can be found on uaf at:

~jdost/bin/hadoop_lsr

List files of a given replication (2 in this example):

hadoop_lsr 2 /some/dir/in/hadoop

Find all files of replication 2 recursively in all subdirectories:

hadoop_lsr 2 -R /some/dir/in/hadoop

ALERT! NOTE the -R option can cause a load increase on the namenode if the file namespace is large enough, use with care!

Changing File Replication

These examples only require built-in hadoop commands.

ALERT! NOTE you need to either be the cmswriter user to modify replication or have root access on a node with a hadoop client.

Set replication to 1 for a single file:

hadoop fs -setrep 1 /some/file/in/hadoop

Set replication to 1 recursively for all files in a directory and all its subdirectories:

hadoop fs -setrep -R 1 /some/dir/in/hadoop

ALERT! NOTE the -R option can cause a load increase on the namenode if the file namespace is large enough, use with care!

Using a bash while loop to set replication to 1 over a list of directories in a file called rep1.txt:

while read line;do echo `date +%T` $line;hadoop fs -setrep -R 1 $line > /dev/null;done < rep1.txt

Namespaces with Replication Set to 1

As of 2014-03-27 we have Replication reduced to 1 for the following:

  • /cms/phedex/store/data/Run2012D/

2014-08-04:

  • /cms/phedex/store/himc
  • /cms/phedex/store/data/Run2012A
  • /cms/phedex/store/data/Run2012B
  • /cms/phedex/store/data/Run2012C
  • /cms/phedex/store/data/Fall13
  • /cms/phedex/store/data/Summer13
  • /cms/phedex/store/relval
  • /cms/phedex/store/mc/Summer12_DR53X

2015-06-30:

  • /cms/phedex/store/mc/Summer11Backfill
  • /cms/phedex/store/mc/Summer11
  • /cms/phedex/store/mc/Summer12FS53
  • /cms/phedex/store/mc/Summer12_DR53X
  • /cms/phedex/store/mc/Summer12Backfill
  • /cms/phedex/store/mc/Summer12pLHE
  • /cms/phedex/store/mc/Summer11LegwmLHE
  • /cms/phedex/store/mc/Summer11Leg
  • /cms/phedex/store/mc/Summer11LegDR
  • /cms/phedex/store/mc/Summer12WMLHE
  • /cms/phedex/store/mc/Summer12 (waiting for FKW's go-ahead for this one since it also has a subdir in our x3 namespace)
  • /cms/phedex/store/mc/Summer12DR53X

Namespaces with Replication Set to 3

CMS started running some digi-reco 2015-05-28 which requires xrootd transfers from UCSDT2 to SDSC. To ensure throughput and file availability we upped replication to the following namespace:

  • /cms/phedex/store/mc/RunIIWinter15GS/MinBias_TuneCUETP8M1_13TeV-pythia8
  • /cms/phedex/store/mc/RunIIFall14GS/MinBias_TuneCUETP8M1_13TeV-pythia8
  • /cms/phedex/store/mc/Summer12/MinBias_TuneZ2star_8TeV-pythia6

Healing progress

As of 2014-09-09, performed healing on a small subset:

  • /cms/phedex/store/data/Run2012A/DoubleElectron/AOD/22Jan2013-v1/20000

This should have healed 33 of the 95 corrupt files as of writing this. Will confirm on 09-10

Monitoring Fallback

Currently the best place to check to ensure fallback is working as expected is the udp log at xrootd-proxy.t2.ucsd.edu:

/var/log/xrootd/hdfs-mon-snatcher.log

This section will be updated once more monitoring tools are developed.

-- JeffreyDost - 2014/03/27

Topic revision: r7 - 2015/06/30 - 21:42:47 - JeffreyDost
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback