Install and Configure BOSCO for Glidein-Based Submission
About this Document
This document describes how to install and configure BOSCO to allow a glideinWMS factory to submit glideins to the BOSCO resource's local batch queue on behalf of a VO frontend. Note, however, the installation and configuration process outlined below is highly specific to the case when you ONLY have ssh-key login access to the user account on the BOSCO resource, i.e., you do not have the ssh password. In addition, it is also important to note that this document is preliminary. As such, it may not represent the best way or the easiest way to install and configure BOSCO. The process below simply attempts to minimize its modification of the standard BOSCO installation and configuration process.
This document follows the general OSG documentation conventions:
- A User Command Line is illustrated by a green box that displays a prompt:
[user@client ~]$
- A Root Command Line is illustrated by a red box that displays the root prompt:
[root@client ~]$
- Lines in a file are illustrated by a yellow box that displays the desired lines in a file:
priorities=1
Definitions
Hostnames:
-
BOSCO_HOST
is the hostname of the host from which glideins will be submitted to the BOSCO resource's local batch queue.
-
FACTORY_HOST
is the hostname of the host where you've installed and configured your glideinWMS factory.
-
FRONTEND_HOST
is the hostname of the host where you've installed and configured your VO's glideinWMS frontend.
Usernames:
-
BOSCO_USER
is the username of the user on the BOSCO_HOST
that has access to the BOSCO resource's local batch queue; e.g., cmsbosco
-
FACTORY_ADMIN_USER
is the username of the user on the FACTORY_HOST
used for all non-root administrative tasks; e.g., gfactory
-
FACTORY_VO_USER
is the username of the user on the FACTORY_HOST
from which glideins are submitted to the BOST_HOST
; e.g., fecmsglobal
-
FRONTEND_USER
is the username of the user on the FRONTEND_HOST
that submits requests for glideins to the FACTORY_HOST
; e.g., frontend
Requirements
- glideinWMS Factory 3.2.6 or later (3.2.8 recommended)
- HTCondor 8.2.4 or later
- condor-bosco
- Already have an ssh login to the target
BOSCO_HOST
Installation and Configuration
- Login to the
FRONTEND_HOST
via ssh as the FRONTEND_USER
. NOTE it is important to log in with -A, this assumes you already have your personal public key access to the BOSCO_HOST
. The bosco_cluster --add
command will use this login to copy the bosco credentials over to the node. [user@client ~]$ ssh -A FRONTEND_USER@FRONTEND_HOST
- Download the BOSCO installer tarball in the
FRONTEND_USER
home directory. [FRONTEND_USER@FRONTEND_HOST ~]$ wget ftp://ftp.cs.wisc.edu/condor/bosco/1.2/boscoinstaller.tar.gz
- Unzip and untar the BOSCO installer in the
FRONTEND_USER
home directory. [FRONTEND_USER@FRONTEND_HOST ~]$ tar -xzf boscoinstaller.tar.gz
- Run the
boscoinstaller
script to install BOSCO on the FRONTEND_HOST
. [FRONTEND_USER@FRONTEND_HOST ~]$ python boscoinstaller
- Generate a passwordless rsa key, just press enter twice with no password when it prompts for one. Note it is important to name the key
bosco_key.rsa
: [FRONTEND_USER@FRONTEND_HOST ~]$ ssh-keygen -t rsa -f ~/.ssh/bosco_key.rsa
- Since BOSCO is not installed in your
FRONTEND_HOST
path, we must (at least temporarily) source its environment configuration file, bosco_setenv
. Please run the following: [FRONTEND_USER@FRONTEND_HOST ~]$ source ~/bosco/bosco_setenv
- For whatever reason the installer doesn't create the
.bosco
dir so create it manually: [FRONTEND_USER@FRONTEND_HOST ~]$ mkdir ~/.bosco
- Start up BOSCO:
[FRONTEND_USER@FRONTEND_HOST ~]$ bosco_start
- Add the
BOSCO_HOST
by running the bosco_cluster
script with the following parameters, this will forward the passwordless bosco ssh key, and install bosco on the remote side: [FRONTEND_USER@FRONTEND_HOST ~]$ bosco_cluster --add BOSCO_USER@BOSCO_HOST BATCH_TYPE
where BATCH_TYPE = pbs, condor, etc.
- Run a BOSCO test job to check the connection between the
FRONTEND_HOST
and the BOSCO_HOST
and its worker nodes. [FRONTEND_USER@FRONTEND_HOST ~]$ bosco_cluster --test BOSCO_USER@BOSCO_HOST
- If successful, run
bosco_stop
on the FRONTEND_HOST
. [FRONTEND_USER@FRONTEND_HOST ~]$ bosco_stop
- Finally, add the following elements to your frontend configuration file,
frontend.xml
. Note, you may add them to either the group or global credential definition. Note: All paths should be absolute, not relative. <credentials>
<credential absfname="/path/to/grid_proxy" security_class="frontend" trust_domain="grid" type="grid_proxy"/>
<credential absfname="/home/frontend/.ssh/bosco_key.rsa.pub" keyabsfname="/home/frontend/.ssh/bosco_key.rsa" pilotabsfname="/path/to/grid_proxy" security_class="frontend" trust_domain="bosco" type="key_pair"/>
</credentials>
- Please stop, reconfig, and restart your frontend. If successful, the
FRONTEND_HOST
is now properly configured.
[root@FRONTEND_HOST ~]$ service gwms-frontend stop
[root@FRONTEND_HOST ~]$ service gwms-frontend reconfig
[root@FRONTEND_HOST ~]$ service gwms-frontend start
- Next, login to the
FACTORY_HOST
via ssh as root
. [user@client ~]$ ssh root@FACTORY_HOST
- Install
condor-bosco
on the FACTORY_HOST
from root
. [root@FACTORY_HOST ~]$ yum install condor-bosco
- Remove and retouch the
60-campus_factory.config
file.
[root@FACTORY_HOST ~]$ rm /etc/condor/config.d/60-campus_factory.config
[root@FACTORY_HOST ~]$ touch /etc/condor/config.d/60-campus_factory.config
- Now, add the entry for the
BOSCO_HOST
to factory configuration file, glideinWMS.xml
.
<entry name="CMS_TX_US_XXXXX_BOSCO" auth_method="key_pair" enabled="True" gatekeeper="BOSCO_USER@BOSCO_HOST" gridtype="batch BATCH_TYPE" rsl="" trust_domain="bosco" verbosity="std" work_dir="~/">
<config>
<max_jobs>
<default_per_frontend glideins="256" held="50" idle="50"/>
<per_entry glideins="256" held="50" idle="50"/>
<per_frontends>
</per_frontends>
</max_jobs>
<release max_per_cycle="20" sleep="0.2"/>
<remove max_per_cycle="5" sleep="0.2"/>
<restrictions require_glidein_glexec_use="False" require_voms_proxy="False"/>
<submit cluster_size="10" max_per_cycle="100" sleep="0.2" slots_layout="fixed">
<submit_attrs>
</submit_attrs>
</submit>
</config>
<allow_frontends>
</allow_frontends>
<attrs>
<attr name="CONDOR_VERSION" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="default"/> <attr name="GLEXEC_JOB" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="False" type="string" value="False"/>
<attr name="GLIDEIN_CMSSite" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="TX_US_XXXXX"/>
<attr name="GLIDEIN_CPUS" const="True" glidein_publish="False" job_publish="True" parameter="True" publish="True" type="string" value="8"/>
<attr name="GLIDEIN_Country" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="US"/>
<attr name="GLIDEIN_Glexec_Use" comment="This has been REQUIRED for historical reasons, OPTIONAL/NONE alt values" const="False" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="string" value="NONE"/>
<attr name="GLIDEIN_MaxMemMBs" const="True" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="int" value="49152"/>
<attr name="GLIDEIN_Max_Walltime" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="int" value="171000"/>
<attr name="GLIDEIN_ResourceName" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="TX_US_XXXXX"/>
<attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="TX_US_XXXXX"/>
<attr name="GLIDEIN_Supported_VOs" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="CMS,MIS"/>
<attr name="USE_CCB" const="True" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="string" value="True"/> <attr name="X509_CERT_DIR" const="True" glidein_publish="False" job_publish="True" parameter="True" publish="True" type="string" value="/cvmfs/oasis.opensciencegrid.org/mis/certificates"/>
</attrs>
<files>
</files>
<infosys_refs>
</infosys_refs>
<monitorgroups>
</monitorgroups>
</entry>
- Finally, build up a global ssh fingerprint list so that the
FACTORY_HOST
trusts the keys of both the BOSCO_HOST
and the FRONTEND_HOST
. [root@FACTORY_HOST ~]$ ssh-keyscan -t rsa,dsa BOSCO_HOST >> /etc/ssh/ssh_known_hosts
[root@FACTORY_HOST ~]$ ssh-keyscan -t rsa,dsa FRONTEND_HOST >> /etc/ssh/ssh_known_hosts
- Stop, reconfigure and restart your factory. If successful, the
FACTORY_HOST
is now properly configured. You may now submit user jobs to the BOSCO_HOST
via the FRONTEND_HOST
.
[root@FRONTEND_HOST ~]$ service gwms-factory stop
[root@FRONTEND_HOST ~]$ service gwms-factory reconfig
[root@FRONTEND_HOST ~]$ service gwms-factory start
Troubleshooting
If glideins and/or direct bosco user jobs fail to be successfully submitted into a local pbs/slurm batch system, it may be useful to modify the ~/bosco/glite/bin/pbs_submit.sh submission script on the
BOSCO_HOST
to see the qsub/sbatch error messages directly.
Before:
jobID=`${pbs_binpath}/qsub $bls_tmp_file` # actual submission
retcode=$?
if [ "$retcode" != "0" ] ; then
rm -f $bls_tmp_file
exit 1
fi
After:
jobID=`${pbs_binpath}/qsub $bls_tmp_file` # actual submission
retcode=$?
echo “Full qsub output: $jobID” 1>&2
if [ "$retcode" != "0" ] ; then
rm -f $bls_tmp_file
exit 1
fi
Additional Documentation
--
JeffreyDost - 2015/05/12