Difference: NFSLiteLigoCompat (1 vs. 4)

Revision 42006/10/04 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Modifications to the NFS Lite installation to Support LIGO VO

Line: 66 to 66
 These changes should occur before the actual submit script is produced by the jobmanager.
Added:
>
>
map { if ($_->[0] eq "LOGNAME") {$logname = $_->[1]; } } @environment;
  if ($logname =~ /.*ligo.*/) { $wrapper_arguments .= " -wrapper_iwd " . $description->directory(); }
Line: 89 to 94
 The following is an untested approach that should just work if your wrapper takes its input via environment variables. This is probably the best approach for most sites as it is fairly easy to pick up environment variables in the job wrapper.
Changed:
<
<
$environment_string = join(';', map {$_->[0] . "=" . $_->[1]} @environment);
>
>
map { if ($_->[0] eq "LOGNAME") {$logname = $_->[1]; } } @environment;

$environment_string = join(';', map {$_->[0] . "=" . $_->[1]} @environment);

  # Added to detect LIGO VO for initial dir if ($logname =~ /.*ligo.*/) { $environment_string .= ";MY_INITIAL_DIR=". $description->directory();

Revision 32006/10/02 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Modifications to the NFS Lite installation to Support LIGO VO

Added:
>
>

Contents

Introduction

 The following document is meant to provide a reference set of modifications to a local sites NFS Lite deployment so that it can support the LIGO VO. Due to differences between site implementations of NFS Lite and other core CE and WN components this document should not to be used as a howto. Instead please use this document as a reference on how to adapt your local jobmanager and wrapper script to support the LIGO VO in NFS Lite. In writing this document it became clear that supporting the capability required by LIGO was as much a site policy issue as a technical one. To reflect that I have included details of how UCSD approached supporting the LIGO VO both from a technical and a site policy perspective.

Where possible detailed code and script examples are provided that while could be used as drop in replacements I recommend that each site admin examine the code provided to determine how best to integrate them into their individual sites.

Line: 451 to 457
  Remote_InitialDir and InitialDir: http://www.cs.wisc.edu/condor/manual/v6.8/condor_submit.html
Added:
>
>

Authors

 -- TerrenceMartin - 19 Sep 2006 \ No newline at end of file

Revision 22006/09/21 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Modifications to the NFS Lite installation to Support LIGO VO

Line: 6 to 6
  Where possible detailed code and script examples are provided that while could be used as drop in replacements I recommend that each site admin examine the code provided to determine how best to integrate them into their individual sites.
Changed:
<
<
NOTE: As of this writing the effectiveness of these modifications has been tested by UCSD using condor-g and DAGman submissions. These tests were based on scripts sent to UCSD by LIGO. LIGO VO is using their own submission to do their final tests. Due to an almost certainly unrelated globus incompatibility between they VO submission and our test CE I have not seen a full front to back test from LIGO. I have moved my modified jobmanager to my production CE as the changes should have no effect on existing submissions from non LIGO VOs. I will work with LIGO over the next couple of days to confirm these changes address their concerns.
>
>
NOTE: As of this writing the effectiveness of these modifications has been tested by UCSD using condor-g and DAGman submissions. These tests were based on scripts sent to UCSD by LIGO.

UPDATE: LIGO has now successfully tested the creation of their directories in InitialDir? as specified by Remote_InitialDir and have transferred 5.1GB of Wave files successfully to UCSD.

  Included below are a standalone condor-g script and a series of dag scripts used during testing. The dag scripts are based on scripts sent to UCSD by LIGO.

Revision 12006/09/20 - Main.TerrenceMartin

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Modifications to the NFS Lite installation to Support LIGO VO

The following document is meant to provide a reference set of modifications to a local sites NFS Lite deployment so that it can support the LIGO VO. Due to differences between site implementations of NFS Lite and other core CE and WN components this document should not to be used as a howto. Instead please use this document as a reference on how to adapt your local jobmanager and wrapper script to support the LIGO VO in NFS Lite. In writing this document it became clear that supporting the capability required by LIGO was as much a site policy issue as a technical one. To reflect that I have included details of how UCSD approached supporting the LIGO VO both from a technical and a site policy perspective.

Where possible detailed code and script examples are provided that while could be used as drop in replacements I recommend that each site admin examine the code provided to determine how best to integrate them into their individual sites.

NOTE: As of this writing the effectiveness of these modifications has been tested by UCSD using condor-g and DAGman submissions. These tests were based on scripts sent to UCSD by LIGO. LIGO VO is using their own submission to do their final tests. Due to an almost certainly unrelated globus incompatibility between they VO submission and our test CE I have not seen a full front to back test from LIGO. I have moved my modified jobmanager to my production CE as the changes should have no effect on existing submissions from non LIGO VOs. I will work with LIGO over the next couple of days to confirm these changes address their concerns.

Included below are a standalone condor-g script and a series of dag scripts used during testing. The dag scripts are based on scripts sent to UCSD by LIGO.

Glossary

  • wrapper: Refers to the USER_JOB_WRAPPER as specified by condor configuration
  • jobmanager: Refers to the condor.pm jobmanager script used in OSG Computer Elements to generate a condor submission script. Usually found in $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/condor.pm

The Problem...Supporting Remote_InitialDir

When UCSD examined the problem of supporting the LIGO VO in NFS Lite installations it was determined that the feature missing was the honoring of the condor grid parameter Remote_InitialDir. Examination of the Condor documentation indicated that this parameter is intended to allow the submitter to indicate the initial working directory of the job on the worker node.

By setting Remote_InitialDir the condor globus submission interacts with the CE jobmanager to set condors InitialDir parameter for the job script submitted to the cluster. Unfortunately InitialDir is only honored by condor for certain universes if the cluster is using a shared file system. NFS Lite explicitly disables shared disk mode in condor which results in the local condor cluster ignoring the InitialDir parameter.

So while none of the jobmanager modifications with NFS Lite explicitly disabled support of the Remote_InitialDir parameter NFS Lite does explicitly disabled shared file systems and therefore Remote_InitialDir is ignored.

The Solution

The solution to the problem is fairly straight forward. Modify the jobmanager and the job wrapper on the worker node to change the jobs initial directory to that specified by Remote_InitialDir. This parameter is available in the jobmanager as the return value of the $description->directory() method. This value then just needs to be passed to the worker node wrapper script so that the correct directory change can be made.

What Would Condor do if the Wrapper Honored InitialDir? ?

Once it was determined that this solution should be effective we examined potential concerns. The primary concern was that while Remote_IntialDir could be honored we were not certain what implicit features were assumed about this directory other than that the job would start there.

For example the copying of files specified by transfer_input_files parameter on the globus submitter. LIGO indicated that they did not think this would effect them if UCSD did not support copying the files specified by transfer_input_files into their starting directory. As it turned out condor appears to copy the files to the InitialDir where the job wrapper starts the job even if the job came to be in that directory via the wrapper. The specifics behind what may be happening has not been yet investigated but either behaviour should work for LIGO.

Site Policy: Architectural and Performance Concerns and Restricting Remote_InitialDir Support

One of the primary goals of the NFS Lite approach to OSG sites is to eliminate a scalability issue that can severely degrade or disable the Computer Element of an OSG site. Specifically the use by submitted jobs of a shared NFS directory for file IO intensive activities. Even though the OSG has some features that assist users in moving the bulk of their IO off of the shared NFS this does not eliminate File IO over NFS completely ie excessive standard IO. UCSD also found in its own operational experience and those of other site admins that a site cannot always depend on the user starting their job in the best location as far as File IO is concerned. To address this UCSD explicity places the users job in a local disk working directory that is dynamically created for each individual job and eliminated the NFS mounts from worker node to CE. This configuration was called NFS Lite. Unfortunately this configuration turned out to caus problems for LIGO VO as they required the initial working directory to be set for their job at the submitter side.

To balance the needs of LIGO, local site policy and the fact that condor explicity ignores InitialDir for non-shared file system cluster UCSD have implemented support for Remote_InitialDir on a per VO basis or per User. This support is accomplished with modifications to the jobmanager and the wrapper that starts user jobs on the worker nodes. UCSD uses a pattern match based on the jobmanager $logname variable to determine if it should honor Remote_InitialDir for the current submitter. Based on the fact that condor does not support the InitialDir parameter for nonshared file systems as of this writing UCSD feels the appropriate default policy is to ignore Remote_InitialDir but allow exceptions.

UCSD site policy will be updated to reflect this change.

Currently only LIGO is supported for Remote_IntialDir at UCSD.

Note: It may be possible to capture whether Remote_InitialDir was set by the submitter and from there assume that the VO knows what they are doing and honor the parameter. UCSD did not investigate this and opted for a VO identity pattern match approach.

Jobmanager Modifications

Some basic modifications to the condor.pm jobmanager are required for the jobmanager to support Remote_IntialDir. The modifications including detecting that Remote_InitialDir should be honored for the submitting VO and then passing the required information to the USER_JOB_WRAPPER so that it can perform a directory relocation to the path specified by the submitter.

Passing Arguments to the Wrapper from the Job Manager

UCSD has a fairly complex perl based to execute a variety of pre-job steps. The wrapper has the ability to accept information using specially crafted command line arguments recognized by the wrapper. These arguments are removed from the jobs command line prior to execution of the job itself so that they do not interfere with the jobs own arguments.

When the jobmanager detects a submitter for which it should honor the Remote_InitialDir it sets the associated command line argument to the wrapper. $logname is always set to the user name running the jobmanager which is mapped from GUMS or the gridmapfile depending on the local site authentication configuration. You can be as simple or as complex as you want with the pattern match. This is a pretty greedy match and could be replaced with something less broad, and quicker.

These changes should occur before the actual submit script is produced by the jobmanager.

    if ($logname =~ /.*ligo.*/) {
        $wrapper_arguments .= " -wrapper_iwd " . $description->directory();
    }
    else {
        $wrapper_arguments .= " -wrapper_iwd " . ' $_CONDOR_SCRATCH_DIR';
    }

The job manager then appends the wrapper specific arguments to the end of the jobs argument string.

    # START UCSD Modification
    print SCRIPT_FILE "Arguments = $argument_string $wrapper_arguments\n";
    # END UCSD Modification

Alternative: Setting the Environment of the Wrapper

*NOT TESTED* If you use this approach please let me know and I will update this wiki.

The following is an untested approach that should just work if your wrapper takes its input via environment variables. This is probably the best approach for most sites as it is fairly easy to pick up environment variables in the job wrapper.

    $environment_string = join(';',
                               map {$_->[0] . "=" . $_->[1]} @environment);
   # Added to detect LIGO VO for initial dir
    if ($logname =~ /.*ligo.*/) {
      $environment_string .= ";MY_INITIAL_DIR=". $description->directory();
    }
   # end of added lines

Once you change the $environment_string it is automatically used as the environment for the job so you should not need to do anything else.

Condors USER_JOB_WRAPPER

Condor supports the ability to run a site defined job wrapper. The name is somewhat inaccurate as the script specified in the worker node condor configuration is not really a wrapper but a script that can perform certain initializations before it performs an exec call to start the submitted job. As a result the submitted job completely replaces the USER_JOB_WRAPPER in memory and the wrapper script ceases to exist in active memory.

NFS Lite uses the USER_JOB_WRAPPER feature of condor to make the final directory relocation of the job as specified by the Remote_IntialDir parameter.

Example condor configuration for a job wrapper

This condor configuration parameter must be available to the worker nodes and the script accessible and executable by the local worker node condor process.

USER_JOB_WRAPPER=/usr/bin/myjobwrapper

Simple Job Wrapper

This job wrapper will do nothing but execute the command sent by the user job to condor including all of its parameters.

#!/bin/sh

exec "$@"

Job Wrapper That Gets the InitialDir? from the Environment

#!/bin/sh

if 
if [ -n "$MY_INITIAL_DIR" ]
then
    cd $MY_INITIAL_DIR

fi

exec "$@"

Job Wrapper That Parses its Command Line

warning!! perl code

You probably do not want to do things this way, but you can. It requires that you detect your arguments and then remove them so as not to interfere with the job being run.

foreach $i (0 .. $#ARGV) {
        if ($ARGV[$i] eq "-wrapper_iwd") {
                $wrapper_iwd = $ARGV[$i+1];
                delete $ARGV[$i];
                delete $ARGV[$i+1];
                $ourargs = 1;
        }
}
#....
chdir $wrapper_iwd;

Jobmanager Output Script

The following is the script generated by a modified jobmanager to support Remote_InitialDir. You can see how the Arguments parameter includes the added parameter to be passed to the wrapper. The environment variable approach would update the jobs environment to included the required information. Also you might noticed that InitialDir parameter is correctly set as well although condor should ignore this since the file systems are not shared between the worker and the CE. Does this have something to do why condor is copying the files correctly??

#
# description file for condor submission
#
Universe = vanilla
Notification = Never
Executable = /osglocal/users/cms/uscms001/.globus/.gass_cache/local/md5/8d/6c90b8c481ea15d364ba4f3b29b8ba/md5/e1/24b786af3b11e1157c6d24e293ad07/data
Requirements = OpSys == "LINUX" && Arch == "X86_64"
X509UserProxy = /osglocal/users/cms/uscms001/.globus/job/osg-gw-3.local/32758.1158172385/x509_up
Environment = OSG_GANGLIA_HOST=t2gw01.local;OSG_DATA=/osgfs/data;OSG_SITE_LONGITUDE=-117.26;GRID3_TMP_WN_DIR=/state/data/osgtmp;OSG_LOCATION=/osglocal/osgcore;OSG_JOB_MANAGER_HOME=/condor/release;OSG_JOB_MANAGER=condor;GRID3_TRANSFER_CONTACT=;GRID3_SITE_NAME=osg-gw-3.t2.ucsd.edu;OSG_JOB_CONTACT=osg-gw-3.t2.ucsd.edu/jobmanager-condor;GRID3_DATA_DIR=/osgfs/data;OSG_GANGLIA_PORT=8649;OSG_GANGLIA_SUPPORT=y;OSG_SITE_INFO=https://tier2.ucsd.edu/t2/index.php?option=com_content&task=view&id=2&Itemid=6;OSG_DEFAULT_SE=gsiftp://osg-gw-3.t2.ucsd.edu:2811/;OSG_GRID=/wn-client;LOGNAME=uscms001;OSG_SITE_NAME=osg-gw-3.t2.ucsd.edu;GRID3_JOB_CONTACT=osg-gw-3.t2.ucsd.edu/jobmanager-condor;OSG_GROUP=OSG;GRID3_USER_VO_MAP=/osglocal/osgcore/monitoring/grid3-user-vo-map.txt;OSG_LSF_LOCATION=;OSG_USER_VO_MAP=/osglocal/osgcore/monitoring/grid3-user-vo-map.txt;OSG_WN_TMP=/state/data/osgtmp;GRID3_GRIDFTP_LOG=/osglocal/osgcore/globus/var/gridftp.log;OSG_MONALISA_SERVICE=y;OSG_UTIL_CONTACT=osg-gw-3.t2.ucsd.edu/jobmanager;OSG_SITE_READ=dcap://dcopy-1.local:22137//pnfs/sdsc.edu/;GRID3_SITE_INFO=https://tier2.ucsd.edu/t2/index.php?option=com_content&task=view&id=2&Itemid=6;OSG_FBS_LOCATION=;OSG_SITE_CITY=La Jolla;HOME=/osglocal/users/cms/uscms001;OSG_SITE_COUNTRY=USA;OSG_CONTACT_NAME=Terrence Martin;LD_LIBRARY_PATH=/osglocal/osgcore/MonaLisa/Service/VDTFarm/pgsql/lib:/osglocal/osgcore/voms/lib:/osglocal/osgcore/prima/lib:/osglocal/osgcore/mysql/lib/mysql:/osglocal/osgcore/jdk1.4/jre/lib/i386:/osglocal/osgcore/jdk1.4/jre/lib/i386/server:/osglocal/osgcore/jdk1.4/jre/lib/i386/client:/osglocal/osgcore/berkeley-db/lib:/osglocal/osgcore/expat/lib:/osglocal/osgcore/globus/lib:;GRID3_TMP_DIR=/osgfs/data;OSG_GRIDFTP_LOG=/osglocal/osgcore/globus/var/gridftp.log;OSG_PBS_LOCATION=;OSG_SGE_LOCATION=;OSG_SGE_ROOT=;OSG_STORAGE_ELEMENT=y;OSG_APP=/code/osgcode;GRID3_BASE_DIR=/osglocal/osgcore;OSG_CONDOR_LOCATION=/condor/release;GLOBUS_GRAM_JOB_CONTACT=https://osg-gw-3.local:51797/32758/1158172385/;GLOBUS_LOCATION=/wn-client/globus;OSG_CONDOR_CONFIG=/etc/condor/condor_config;GLOBUS_REMOTE_IO_URL=/osglocal/users/cms/uscms001/.globus/job/osg-gw-3.local/32758.1158172385/remote_io_url;OSG_SPONSOR=cms:50 cdf:50;OSG_SITE_LATITUDE=32.85;GLOBUS_GRAM_MYJOB_CONTACT=URLx-nexus://osg-gw-3.local:51798/;OSG_SITE_WRITE=srm://t2data2.t2.ucsd.edu:8443/;GRID3_SPONSOR=cms:50 cdf:50;GRID3_APP_DIR=/code/osgcode;CHANGED_X509=/osglocal/users/cms/uscms001/.globus/job/osg-gw-3.local/32758.1158172385/x509_up;GRID3_UTIL_CONTACT=osg-gw-3.t2.ucsd.edu/jobmanager;OSG_CONTACT_EMAIL=tmartin@physics.ucsd.edu;OSG_VO_MODULES=y
Arguments = 120 230  -wrapper_iwd /osgfs/data/tmartin/inspiral-0-20060911T161959-0700/
InitialDir = /osgfs/data/tmartin/inspiral-0-20060911T161959-0700/
Input = /dev/null
Log = /osglocal/osgcore/globus/tmp/gram_job_state/gram_condor_log.32758.1158172385
log_xml = True
+AccountingGroup = "group_cms.uscms001"
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_output = true
transfer_input_files =
#Extra attributes specified by client

Output = /osglocal/users/cms/uscms001/.globus/job/osg-gw-3.local/32758.1158172385/stdout
Error = /osglocal/users/cms/uscms001/.globus/job/osg-gw-3.local/32758.1158172385/stderr

I still expect the proxy to be read from $_CONDOR_SCRATCH_DIR and for condor to set X509_USER_PROXY to reflect this. This appears to be what happens based on this snippet from the environment dump of my DAG test of the Remote_InitiaDir changes.

X509_USER_PROXY=/state/data/condor_local/execute/dir_15457/x509_up

Test Scripts

These scripts are provided as is and will need to be modified to support your own local cluster.

Condor-G test script

This condor-g test script was used to test the Remote_InitialDir functionality at UCSD. You may wish to use this script, with modifications, to test your own implementations of the above changes. This job was submitted via condor_submit.

my-script.cmd

universe=globus
GlobusScheduler=osg-gw-3.t2.ucsd.edu:/jobmanager-condor
executable=/home/users/tmartin/Cluster_Tests/ENV_test/var1.sh
stream_output = False
stream_error  = False
WhenToTransferOutput = ON_EXIT
transfer_input_files = /bin/ps,/bin/hostname,/code/osgcode/tmartin/HitsTest-condor.sh,/code/osgcode/tmartin/OscarTest-condor.sh,/code/osgcode/tmartin/ValidateCMSSWSoftware-condor.sh
remote_initialdir = /osgfs/data/tmartin/

arguments=120 230
output = ./output/initial_dir-2.out
error  = ./output/initial_dir-2.err
log    = ./output/initial_dir-2.log
queue

DAG Test Script

These DAG tests scripts were based on an initial script sent to UCSD by LIGO to be similar to a LIGO submitted job. UCSD used this test to confirm functionality when using DAGman which is what LIGO uses. These scripts were submitted using condor_submit_dag. mydag.dag

Job setup startup.cmd
Job run middle.cmd
Job clean finish.cmd
PARENT setup CHILD run clean
PARENT run CHILD clean

startup.cmd

######################################################################
# GRIPHYN VDS SUBMIT FILE GENERATOR
# DAG : inspiral, Index = 0, Count = 1
# SUBMIT FILE NAME : inspiral_0_osg_gw_3.t2.ucsd.edu_cdir.sub
######################################################################
environment = app=/code/osgcode;data=/osgfs/data;grid3=/code/osgcode/wn-client;tmp=/osgfs/data;wntmp=/state/data/osgtmp;
arguments = -n dirmanager -N Pegasus::dirmanager:1.0 -R osg_gw_3.t2.ucsd.edu /code/osgcode/wn-client/vds/bin/dirmanager --create --dir /osgfs/data/tmartin/inspiral-0-20060911T161959-0700
copy_to_spool = false
error = ./output/vds.err
executable = /code/osgcode/wn-client/vds/bin/kickstart
globusrsl = (jobtype=single)
globusscheduler = osg-gw-3.t2.ucsd.edu/jobmanager-condor
log = ./output/vds.log
notification = NEVER
output = ./output/vds.out
periodic_release = (NumSystemHolds <= 3)
periodic_remove = (NumSystemHolds > 3)
remote_initialdir = /osgfs/data/tmartin/
submit_event_user_notes = pool:osg_gw_3.t2.ucsd.edu
transfer_error = true
transfer_executable = false
transfer_output = true
universe = globus
+vds_generator = "Pegasus"
+vds_version = "1.4.7cvs"
+vds_wf_name = "inspiral-0"
+vds_wf_time = "20060911T161959-0700"
+vds_wf_xformation = "dirmanager"
+vds_wf_derivation = "Pegasus::dirmanager:1.0"
+vds_job_class = 6
+vds_job_id = "inspiral_0_osg_gw_3.t2.ucsd.edu_cdir"
+vds_site = "osg_gw_3.t2.ucsd.edu"
queue

middle.cmd

universe=globus
GlobusScheduler=osg-gw-3.t2.ucsd.edu:/jobmanager-condor
executable=/home/users/tmartin/Cluster_Tests/DAGtest/middle.sh
stream_output = False
stream_error  = False
WhenToTransferOutput = ON_EXIT
transfer_input_files = /code/osgcode/tmartin/HitsTest-condor.sh,/code/osgcode/tmartin/OscarTest-condor.sh,/code/osgcode/tmartin/ValidateCMSSWSoftware-condor.sh
remote_initialdir = /osgfs/data/tmartin/inspiral-0-20060911T161959-0700/

arguments=120 230
output = ./output/initial_dir-1.out
error  = ./output/initial_dir-1.err
log    = ./output/initial_dir-1.log
queue

middle.sh This sh scripts is a UCSD regression test script we often use to test basic cluster functionality. Please replace with your own script.

#!/bin/sh

host=`/bin/hostname`
date=`/bin/date`
who=`/usr/bin/whoami`
ps=`/bin/ps awx`
pwd=`/bin/pwd`

echo "Where is perl"

echo $GLOBUS_LOCATION

echo "$who\@$host on $date"

echo

echo "PWD $pwd"

echo $X509_USER_PROXY


echo

echo "Scratch Dir : $_CONDOR_SCRATCH_DIR"

ls -alF $_CONDOR_SCRATCH_DIR
cp -fv /etc/group $_CONDOR_SCRATCH_DIR/myoutput.txt
ls -alF $_CONDOR_SCRATCH_DIR

echo "ls -alF $OSG_DATA"
ls -alF $OSG_DATA
#echo "cp -fv /etc/group $OSG_DATA/$RANDOM.$RANDOM.file"
#cp -fv /etc/group $OSG_DATA/$RANDOM.$RANDOM.file

echo
echo "-----------------------------"
echo

echo "Checking for srmcp"
which srmcp
echo "Sourcing setup.sh from wn-client"
ls -l $OSG_GRID/setup.sh
source $OSG_GRID/setup.sh
echo "Checking for srmcp again"
which srmcp

echo "Checking path"
echo $PATH

echo "Attempting to run srmcp"

$OSG_GRID/srmclient/bin/srmcp --help

echo
echo "-----------------------------"
echo

ls -alF $_CONDOR_SCRATCH_DIR
date >> $_CONDOR_SCRATCH_DIR/myoutput.txt
hostname >> $_CONDOR_SCRATCH_DIR/myoutput.txt
whoami >> $_CONDOR_SCRATCH_DIR/myoutput.txt
ls -alF $_CONDOR_SCRATCH_DIR

echo
echo "-----------------------------"
echo

ls -l $OSG_APP/cmssoft/cms/Releases/CMSSW/CMSSW_0_7_0/bin/slc3_ia32_gcc323/cmsRun

#echo "Sleeping for $1"

#sleep $1
echo


echo "=========="
echo "Checking for /uaf/clustertmp/"
ls -l /uaf/clustertmp/
echo "========="


echo "=========="
echo "testing running srmcp"
DATE=`date +%s`
FILE="$RANDOM-$DATE.out"
time $OSG_GRID/srmclient/bin/srmcp --debug=true srm://t2data2.t2.ucsd.edu:8443//data4/cms/userdata/tmartin/9298.out file://localhost//dev/null
time rm -fv $FILE
echo "=========="


echo "Running ValidateCMSSWSoftware-condor.sh"
echo "======================================="
chmod 755 ./ValidateCMSSWSoftware-condor.sh
./ValidateCMSSWSoftware-condor.sh
echo
echo
echo

echo "Running OscarTest-condor.sh"
echo "======================================="
chmod 755 ./OscarTest-condor.sh
./OscarTest-condor.sh
echo
echo
echo

echo "Running HitsTest-condor.sh"
echo "======================================="
chmod 755 ./HitsTest-condor.sh
./HitsTest-condor.sh
echo
echo
echo

finish.cmd

universe=globus
GlobusScheduler=osg-gw-3.t2.ucsd.edu:/jobmanager-condor
executable=/home/users/tmartin/Cluster_Tests/DAGtest/finish.sh
stream_output = False
stream_error  = False
WhenToTransferOutput = ON_EXIT
transfer_input_files =

remote_initialdir = /osgfs/data/tmartin/inspiral-0-20060911T161959-0700/

arguments=120 230
output = ./output/initial_dir-2.out
error  = ./output/initial_dir-2.err
log    = ./output/initial_dir-2.log
queue

finish.sh

#!/bin/sh

/bin/hostname
/bin/date
/usr/bin/whoami
/bin/pwd

ls -la .

Final Thoughts and Comments

Condor documentation is fairly clear that IntialDir is not supported in non-shared file system configurations. Since the OSG does not require shared file systems and sites have clearly shown they benefit from NFS Lite it is probably correct behaviour that Remote_InitialDir is guaranteed to work and VOs should take this into account. However since the documentation for condor is not explicit that Remote_InitialDir works only if the remote sites implements a particular file system configuration it is probably reasonable that Remote_InitialDir functionality be supported in some circumstances.

For now it is UCSD site policy that Remote_InitialDir will be supported for the LIGO VO and will consider requests for support of this parameter on a VO by VO basis.

References and External Documents

USER_JOB_WRAPPER: http://www.cs.wisc.edu/condor/manual/v6.8/7_3Running_Condor.html

Remote_InitialDir and InitialDir: http://www.cs.wisc.edu/condor/manual/v6.8/condor_submit.html

-- TerrenceMartin - 19 Sep 2006

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback