Documentation for Submitting Jobs to the Local Batch Queue
Setting up a CMSSW release area to work in
One of the things we are often doing is running cmsRun type jobs on local datasets or to generate small samples. To do these things we may want to run a batch job instead of a local job to keep the computing freed up for other people to use. In these cases, you will need to set up a CMSSW release area on the node where you are running. It may be that not all of these export/setup type commands are necessary, but this combination works. Feel free to test which are redundant or unnecessary - I didn't.
What follows is an example of the setup section of myscript.bash:
#!/bin/bash
export CMS_PATH=/code/osgcode/cmssoft/cms
export SCRAM_ARCH=slc4_ia32_gcc345
export CERN=$CMS_PATH/lcg/external/cern
export CERN_LEVEL=2004
export CERN_ROOT=$CERN/$CERN_LEVEL/slc4_ia32_gcc345
export GROUP_DIR=$CMS_PATH/setup
export CMS_SYS=`$CMS_PATH/utils/fake-sys`
export CMS_SYS=slc4_ia32_gcc345
source $GROUP_DIR/group_aliases.sh
export PATH=${CMS_PATH}/bin/${CMS_SYS}:${CMS_PATH}/utils:${CERN_ROOT}/bin:$PATH
source /data/vdt/setup.sh
source $OSG_GRID/setup.sh
export http_proxy=clarens-1.local:3128
source /code/osgcode/cmssoft/cmsset_default.sh CMSSW_1_6_8
export SCRAM_ARCH=slc4_ia32_gcc345
scramv1 p -n globe168 CMSSW CMSSW_1_6_8
cd globe168/src/
eval `scramv1 ru -sh`
Running cmsRun
Now you are ready to actually do what you came for, execute your cmsRun bit. You have set up your working area,
globe168
, now you should retrieve a tarball containing the things you need to run. In this case, the tarball contains some CMSSW modules that I modified and need to build, as well as my .cfg and .cff files. I just tarred the whole directory structure of a cleaned working area (
~/.../globe168/src/*
) on uaf-2. By cleaned I mean without big root files and other junk. This keeps me from having to make the proper directory structure by hand.
So, I will use wget to pull my tarball into my working area, untar it, build my stuff, and then run cmsRun. After that you must copy out (srmcp) your work and whatever logs you want to your area in dcache and clean up after yourself. Recall, I am starting from the above script, so I am in
/someworkingdirectory/globe168/src
.
myscript.bash section 2:
wget http://hepuser.ucsd.edu/~edusinberre/globeRun.tgz
gtar -zxf globeRun.tgz
scramv1 b
cd Analyzers/GlobeAnalyzer/test/
cmsRun HWW140_$1.cfg
srmcp file:////$PWD/HWW140_$1.root srm://t2data2.t2.ucsd.edu:8443/srm/managerv1?SFN=/pnfs/sdsc.edu/data3/cms/phedex/store/users/edusinberre/crab_output/V41/HWW1
40/HWW140_$1.root
cd ../../../../../
ls
rm -r globe168
An Example Submission Script
Now you need to be able to send your job to the queue. It's pretty simple, the script that follows (mysub.condor) is an example of the script that will submit your jobs to the condor queue if you say:
> condor_submit mysub.submit
The first section of this file is attributes that will apply to all the jobs you submit. The lower section has a set of four lines for each individual job. Here you give the arguments (optional) that myscript.bash takes, make unique log file names (you can do this however you want or choose not to), and then queue the job.
mysub.submit:
universe=grid
Grid_Resource=gt2 osg-gw-2.t2.ucsd.edu:/jobmanager-condor
executable=myscript.bash
stream_output = False
stream_error = False
WhenToTransferOutput = ON_EXIT
transfer_input_files =
transfer_Output_files =
log = /data/tmp/edusinberre/condor/test.log
Notification = Never
arguments=1
output = ./output/diLepton.$(Cluster).$(Process).out
error = ./output/diLepton.$(Cluster).$(Process).err
queue
arguments=2
output = ./output/diLepton.$(Cluster).$(Process).out
error = ./output/diLepton.$(Cluster).$(Process).err
queue
condor_q will now show output like:
62030.0 yourusername 4/16 11:15 0+00:00:00 I 0 0.0 myscript.bash 1
62030.1 yourusername 4/16 11:15 0+00:00:00 I 0 0.0 myscript.bash 2
and soon it should look more like:
62030.0 yourusername 4/16 11:15 0+00:20:47 R 0 0.0 myscript.bash 1
62030.1 yourusername 4/16 11:15 0+00:20:47 R 0 0.0 myscript.bash 2
A Second Example (using Root without CMSSW)
In this example my bash script (job.bash) executes a script which is set up to run on root files in dcache. I copy the output to dcache using srmcp.
In the submission script (job.sub) you just need to set the executable to be your bash script.
submission script (job.sub)
universe=grid
Grid_Resource=gt2 osg-gw-4.t2.ucsd.edu:/jobmanager-condor
executable=job.bash
stream_output = False
stream_error = False
WhenToTransferOutput = ON_EXIT
transfer_input_files =
transfer_Output_files =
log = /data/tmp/ssimon/reduce_Apr26_110333/test1.log
Notification = Never
output = ./output/condor_batch.$(Cluster).$(Process).out
error = ./output/condor_batch.$(Cluster).$(Process).err
queue
bash script (job.bash)
## setup root env
source ${OSG_GRID}/setup.sh
export ROOTSYS=${OSG_APP}/UCSD_root/root_v5.18.00
export PATH=${PATH}:${ROOTSYS}/bin
export LD_LIBRARY_PATH=${ROOTSYS}/lib:${LD_LIBRARY_PATH}
## some basic output for debugging
umask 002
/bin/hostname
echo "Who am I?"
/usr/bin/whoami
echo "pwd ---"
pwd
echo "ls ---"
ls
## begin my job script commands
mkdir globe
cd globe
wget http://hepuser.ucsd.edu/~ssimon/condorfiles/reduceApr26_110333.tgz
tar -zxvf reduceApr26_110333.tgz
rm reduceApr26_110333.tgz
cd reduce
./reduce_batch hgg 400 0 Hgamgam120glufus
/bin/ls -1 ~/globereduce/reduce/*.root > theoutputfiles.txt
mv theoutputfiles.txt ~/globereduce/reduce
cd ~/globereduce/reduce
for ifile in `less theoutputfiles.txt | awk -F/ '{print $NF}'`;do
srmcp file:///${ifile} srm://(...)/managerv1?SFN=/pnf(...)/${ifile}
done
cd
## end my job script commands
echo "clean up ---"
rm -rf globe
rm -rf globereduce
echo "ls ---"
ls
exit
With these two files created you can run
condor_submit job.sub
Some Useful Condor Stuff
condor_submit mysub.submit
- submit your jobs to the queue using a submission script, mysub.submit
condor_q
- lets you see the status, runtime, etc of your jobs in the queue. Optionally, give your username as an argument to see only your jobs
condor_rm 62030.2
- kills job 62030.2, can also substitute in your username to kill all of your jobs
--
ElizabethDusinberre - 16 Apr 2008