Documentation for Submitting Jobs to the Local Batch Queue

Setting up a CMSSW release area to work in

One of the things we are often doing is running cmsRun type jobs on local datasets or to generate small samples. To do these things we may want to run a batch job instead of a local job to keep the computing freed up for other people to use. In these cases, you will need to set up a CMSSW release area on the node where you are running. It may be that not all of these export/setup type commands are necessary, but this combination works. Feel free to test which are redundant or unnecessary - I didn't.

What follows is an example of the setup section of myscript.bash:


#!/bin/bash

export CMS_PATH=/code/osgcode/cmssoft/cms
export SCRAM_ARCH=slc4_ia32_gcc345
export CERN=$CMS_PATH/lcg/external/cern
export CERN_LEVEL=2004
export CERN_ROOT=$CERN/$CERN_LEVEL/slc4_ia32_gcc345
export GROUP_DIR=$CMS_PATH/setup
export CMS_SYS=`$CMS_PATH/utils/fake-sys`
export CMS_SYS=slc4_ia32_gcc345
source $GROUP_DIR/group_aliases.sh
export PATH=${CMS_PATH}/bin/${CMS_SYS}:${CMS_PATH}/utils:${CERN_ROOT}/bin:$PATH



source /data/vdt/setup.sh
source $OSG_GRID/setup.sh
export http_proxy=clarens-1.local:3128

source /code/osgcode/cmssoft/cmsset_default.sh CMSSW_1_6_8
export SCRAM_ARCH=slc4_ia32_gcc345


scramv1 p -n globe168 CMSSW CMSSW_1_6_8

cd globe168/src/

eval `scramv1 ru -sh`


Running cmsRun

Now you are ready to actually do what you came for, execute your cmsRun bit. You have set up your working area, globe168, now you should retrieve a tarball containing the things you need to run. In this case, the tarball contains some CMSSW modules that I modified and need to build, as well as my .cfg and .cff files. I just tarred the whole directory structure of a cleaned working area (~/.../globe168/src/*) on uaf-2. By cleaned I mean without big root files and other junk. This keeps me from having to make the proper directory structure by hand.

So, I will use wget to pull my tarball into my working area, untar it, build my stuff, and then run cmsRun. After that you must copy out (srmcp) your work and whatever logs you want to your area in dcache and clean up after yourself. Recall, I am starting from the above script, so I am in /someworkingdirectory/globe168/src.

myscript.bash section 2:


wget http://hepuser.ucsd.edu/~edusinberre/globeRun.tgz

gtar -zxf globeRun.tgz

scramv1 b
cd Analyzers/GlobeAnalyzer/test/

cmsRun HWW140_$1.cfg

srmcp file:////$PWD/HWW140_$1.root srm://t2data2.t2.ucsd.edu:8443/srm/managerv1?SFN=/pnfs/sdsc.edu/data3/cms/phedex/store/users/edusinberre/crab_output/V41/HWW1
40/HWW140_$1.root

cd ../../../../../
ls
rm -r globe168

 

An Example Submission Script

Now you need to be able to send your job to the queue. It's pretty simple, the script that follows (mysub.condor) is an example of the script that will submit your jobs to the condor queue if you say:

 > condor_submit mysub.submit 

The first section of this file is attributes that will apply to all the jobs you submit. The lower section has a set of four lines for each individual job. Here you give the arguments (optional) that myscript.bash takes, make unique log file names (you can do this however you want or choose not to), and then queue the job.

mysub.submit:

universe=grid
Grid_Resource=gt2 osg-gw-2.t2.ucsd.edu:/jobmanager-condor
executable=myscript.bash
stream_output = False
stream_error  = False
WhenToTransferOutput = ON_EXIT
transfer_input_files =
transfer_Output_files =
log    = /data/tmp/edusinberre/condor/test.log
Notification = Never

arguments=1
output = ./output/diLepton.$(Cluster).$(Process).out
error  = ./output/diLepton.$(Cluster).$(Process).err
queue

arguments=2
output = ./output/diLepton.$(Cluster).$(Process).out
error  = ./output/diLepton.$(Cluster).$(Process).err
queue

condor_q will now show output like:

62030.0   yourusername     4/16 11:15   0+00:00:00 I  0   0.0  myscript.bash 1  
62030.1   yourusername     4/16 11:15   0+00:00:00 I  0   0.0  myscript.bash 2 

and soon it should look more like:

62030.0   yourusername     4/16 11:15   0+00:20:47 R  0   0.0  myscript.bash 1  
62030.1   yourusername     4/16 11:15   0+00:20:47 R  0   0.0  myscript.bash 2 

A Second Example (using Root without CMSSW)

In this example my bash script (job.bash) executes a script which is set up to run on root files in dcache. I copy the output to dcache using srmcp.

In the submission script (job.sub) you just need to set the executable to be your bash script.

submission script (job.sub)

universe=grid
Grid_Resource=gt2 osg-gw-4.t2.ucsd.edu:/jobmanager-condor
executable=job.bash
stream_output = False
stream_error  = False
WhenToTransferOutput = ON_EXIT
transfer_input_files =
transfer_Output_files =
log    = /data/tmp/ssimon/reduce_Apr26_110333/test1.log
Notification = Never
output = ./output/condor_batch.$(Cluster).$(Process).out
error  = ./output/condor_batch.$(Cluster).$(Process).err
queue


bash script (job.bash)

## setup root env
source ${OSG_GRID}/setup.sh
export ROOTSYS=${OSG_APP}/UCSD_root/root_v5.18.00
export PATH=${PATH}:${ROOTSYS}/bin
export LD_LIBRARY_PATH=${ROOTSYS}/lib:${LD_LIBRARY_PATH}

## some basic output for debugging
umask 002
/bin/hostname
echo "Who am I?"
/usr/bin/whoami
echo "pwd ---"
pwd
echo "ls ---"
ls

## begin my job script commands

mkdir globe
cd globe
wget http://hepuser.ucsd.edu/~ssimon/condorfiles/reduceApr26_110333.tgz
tar -zxvf reduceApr26_110333.tgz
rm reduceApr26_110333.tgz
cd reduce

./reduce_batch hgg 400 0 Hgamgam120glufus 
/bin/ls -1 ~/globereduce/reduce/*.root > theoutputfiles.txt

mv theoutputfiles.txt ~/globereduce/reduce
cd ~/globereduce/reduce

for ifile in `less theoutputfiles.txt | awk -F/ '{print $NF}'`;do
srmcp file:///${ifile} srm://(...)/managerv1?SFN=/pnf(...)/${ifile}
done

cd

## end my job script commands

echo "clean up ---"
rm -rf globe
rm -rf globereduce

echo "ls ---"
ls

exit

With these two files created you can run

condor_submit job.sub

Some Useful Condor Stuff

condor_submit mysub.submit - submit your jobs to the queue using a submission script, mysub.submit

condor_q - lets you see the status, runtime, etc of your jobs in the queue. Optionally, give your username as an argument to see only your jobs

condor_rm 62030.2 - kills job 62030.2, can also substitute in your username to kill all of your jobs

-- ElizabethDusinberre - 16 Apr 2008

Topic revision: r2 - 2008/04/26 - 21:51:32 - SeanSimon
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback