Install and Configure condor_annex
About this Document
condor_annex is a Perl-based script that utilizes the Amazon Web Services (AWS) command-line interface (CLI) and other AWS services to orchestrate the delivery of HTCondor execute nodes running on AWS Elastic Compute Cloud (EC2) instances to an HTCondor pool. This document describes how to install, configure, and run condor_annex successfully from your own local HTCondor pool.
This document follows the general Open Science Grid (OSG) documentation conventions:
- A User Command Line is illustrated by a green box that displays a prompt:
[user@client ~]$
- A Root Command Line is illustrated by a red box that displays the root prompt:
[root@client ~]$
- Lines in a file are illustrated by a yellow box that displays the desired lines in a file:
priorities=1
Definitions
- SUBMIT is the hostname of an HTCondor submit node, where users submit their jobs to your local pool.
- CENTRAL_MANAGER is the hostname of your HTCondor central manager, where job and machine class ads are matched.
- EXECUTE is the hostname of an HTCondor execute node in your local pool.
- ANNEX is the hostname of an EC2 instance configured as a condor_annex execute node.
Requirements
- An HTCondor pool
- An Amazon Web Services Account
Step 1: Install and Configure an HTCondor Pool
If you do not already have your own HTCondor Pool, you may want to first start by installing your own
personal HTCondor pool to experiment with condor_annex. Please consult the
HTCondor Manual and/or
Wiki for more information.
Step 2: Obtain an Amazon Web Services Account
In order to use condor_annex, you must already have an AWS account. You may establish an AWS account under the UC-wide agreement by following the
instructions provided by Blink.
Step 3: Obtain AWS Account Credentials
condor_annex issues programmatic requests to AWS services via the AWS command-line interface (CLI). In order to issue these requests, the AWS CLI must sign them using your AWS account credentials. These credentials consist of an
Access Key ID and a
Secret Access Key. If you do not have these access keys, you may create them using the AWS Management Console. AWS recommends that you use Identity and Access Management (IAM) access keys instead of your root account access keys.
To create access keys, you must have permissions to perform the required IAM actions.
- Open the IAM console.
- In the navigation pane, choose Users.
- If you do not already have an IAM username, then select Add User. Each new user is issued credentials.
- If you already have an IAM username, then choose your IAM username (not the check box).
- Next, select the Security Credentials tab and then choose Create Access Key.
- Your credentials will look something like this:
- Access Key ID: AKIAIOSFODNN7EXAMPLE
- Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- Choose Download .csv file, and store the keys in a secure location. Your secret key will no longer be available through the AWS Management Console; you will have the only copy. Keep it confidential in order to protect your account, and never email it. Do not share it outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your secret key.
Save your Access Key ID and Secret Access Key. You will need to provide them later when configuring the AWS CLI. If you need more information about AWS Security Credentials, please consult the
AWS documentation.
Step 4: Select a Region for the Annex
Amazon Elastic Compute Cloud (EC2) instances are hosted in multiple locations world-wide. These locations are composed of
Regions and
Availability Zones. Each Region is a separate geographic area. However, each Region also has multiple, isolated locations known as Availability Zones (AZs). However,
not all AWS Regions are created equal. Each Region may offer only a subset of AWS services. You can find out what services are offered in each Region from
the table provided here.
When selecting a Region for your annex, you must select a region that offers all of the AWS services required by condor_annex to function properly. These services are:
AWS Lambda currently has the most limited deployment of any AWS service required by condor_annex. For example, AWS Lambda is only available in the following Regions within the United States at this time:
- Northern Virginia (us-east-1)
- Ohio (us-east-2)
- Oregon (us-west-2)
Select your desired Region accordingly from the drop-down menu in the upper-right-hand side of the AWS Management Console.
Step 5: Generate an EC2 Key Pair
After selecting a Region for your annex, you will need to generate an SSH key pair that will allow you to login to your EC2 instances in that Region. You can create a key pair using the EC2 console or the command line. You will specify this key pair when launching your instances with condor_annex.
To create your key pair using the Amazon EC2 console
- Open the EC2 console.
- In the navigation pane, under NETWORK & SECURITY, choose Key Pairs.
- Choose Create Key Pair.
- Enter a name for the new key pair in the Key pair name field of the Create Key Pair dialog box, and then choose Create.
- The private key file is automatically downloaded by your browser. The base file name is the name you specified as the name of your key pair, and the file name extension is .pem. Save the private key file in a safe place. This is the only chance for you to save the private key file. You'll need to provide the name of your key pair when you launch an instance and the corresponding private key each time you connect to the instance.
- Use the following command to set the permissions of your private key file so that only you can read it.
[user@SUBMIT ~]$ chmod 400 my-key-pair.pem
If you would like to create your SSH key pair using the AWS CLI or import your own key pair, please consult the
AWS documentation.
Step 6: Configure Default VPC Security Group
condor_annex will automatically create and configure an AWS Security Group (i.e., a virtual firewall) around all of the instances within an annex. However, depending on your HTCondor pool configuration, it may also been useful to place some on-demand resources in AWS. For example, you may want to a separate HTCondor central manager instance located in AWS in order to flock user jobs over to the annex instead of connecting the annex instances back to your local central manager.
Any such on-demand resources may be placed in your AWS Region's default Virtual Private Cloud (VPC) Security Group. To configure the default VPC Security Group:
- Open the VPC console.
- In the navigation pane, under Security, choose Security Groups.
- Select the Security Group in the list that has Group Name default and Description default VPC security group.
- Next, select the Inbound Rules tab and then click on the Edit button.
By default, the only inbound rule should be one allowing all traffic from instances assigned to the default VPC Security Group.
We recommend the following set of inbound rules be used for the default VPC Security Group:
- Keep the default rule allowing all traffic from instances assigned to the default VPC Security Group.
- Allow all inbound traffic from instances within your AWS Region's default VPC's private network IP address space.
- Allow inbound SSH traffic on port 22.
- Allow inbound ICMP traffic.
- Allow inbound HTCondor UDP traffic on port 9618
- Allow inbound HTCondor TCP traffic on port 9618.
In their most permissive form, these inbound rules for the default VPC security group will look something like this:
Type |
Protocol |
Port Range |
Source |
All traffic |
All |
All |
sg-5437332d (default) |
All traffic |
All |
All |
172.31.0.0/16 |
SSH |
TCP |
22 |
0.0.0.0/0 |
All ICMP |
All |
N/A |
0.0.0.0/0 |
Custom UDP Rule |
UDP |
9618 |
0.0.0.0/0 |
Custom TCP Rule |
TCP |
9618 |
0.0.0.0/0 |
Of course, you should try to restrict the
Source IP address space for these rules as much as possible. For example, you may want to limit them to inbound traffic from your home institution's public IP address space.
By default, each Security Group, including the default VPC Security Group, allows ALL outbound traffic.
If you would like to restrict outbound traffic from the default VPC Security Group, select the
Outbound Rules tab, click on the
Edit button, and then configure the outbound rules accordingly.
Step 7: Create an condor_annex-compatible Amazon Machine Image
Each HTCondor execute instance within your annex must run a condor_annex-compatible Amazon Machine Image (AMI). By default, condor_annex will attempt to use one of the publicly available Amazon Linux AMIs with HTCondor 8.4.2 pre-installed currently provided by the HTCondor team. These condor_annex-compatible AMIs are available in the following AWS Regions within the United States:
Region |
AMI ID |
us-west-1 |
ami-7f06731f |
us-east-1 |
ami-91e1a3fb |
us-west-2 |
ami-ac8890cd |
If these preconfigured AMIs cannot be successfully modified to suit your needs, you will need to create your own condor_annex-compatible AMI. We have done so for our own purposes by building a condor_annex-compatible CentOS 6-based AMI.
To build your own condor_annex-compatible AMI, open the Elastic Compute Cloud (EC2) dashboard in the Region where you will run your annex. Click on the
Launch Instance button. This will open the instance launch configuration wizard. Follow these steps.
- Choose an Amazon Machine Image (AMI): We configured our annex's execute instances to use CentOS 6. To find a suitable CentOS 6 AMI to start from, select the AWS Marketplace tab and then enter "CentOS 6" in the search box. Your search will return multiple results. However, the most up-to-date AMI should be the first one in the list. Unless you have special requirements for your configuration, select this AMI by clicking on the Select button.
- Choose an Instance Type: Once you have selected an AMI, the launch configuration wizard will prompt you to select an instance type on which to build your condor_annex execute node. Choose one that suits your needs. Once you have selected your instance type, click on the Next: Configure Instance Details button.
- Configure Instance Details: Only one instance is required to configure your condor_annex-compatible AMI. Therefore, you may leave the Number of Instances at 1. Next, select one of your Network VPCs. In general, you should choose the default VPC whose Security Group was pre-configured in the previous step. Once you have determined which VPC will host this instance, select a specific Subnet in which to place it. The other networking options Auto-assign Public IP and Placement group may be left set to their default settings of Use subnet setting (Enabled) and No placement group, respectively. After configuring the networking details, if you would like to apply a specific IAM role to the instance, then select an appropriate role for it. Otherwise, leave IAM role set to its default value of None. All other instance details may be configured with their default values. Once you have completed configuring your instance details, click on the Next: Add Storage button.
- Add Storage: In general, you will not have to modify the configuration of your root storage volume for the instance. However, the launch wizard may still default to a Magnetic volume type, even though the General Purpose SSD option is now becoming AWS' recommended default. Our instance launch wizard still defaults to Magnetic. As such, we changed our root Volume Type from an 8GiB Magnetic volume to an 8 GiB General Purpose SSD volume and selected Delete on Termination. Once you have completed the configuration of your root volume, click on the Next: Tag Instance button.
- Tag Instance: Add a Name to your instance and then click on the Next: Configure Security Group button.
- Configure Security Group: Select an existing security group and choose your default VPC security group. Once you have selected a security group, click on the Review and Launch button.
- Review Instance Launch: Review the configuration of your instance and make any necessary changes. Once done, click on the Launch button. You will be prompted to Select an existing key pair or create a new key pair, which will enable your SSH access to the instance. Select one of your existing key pairs (or create a new one) and then agree to the acknowledgement statement by clicking on the checkbox next to it. Once you have selected your key pair, click on the Launch Instances button. After launching your instance, the wizard will display the Launch Status page. To return to the main EC2 dashboard, scroll down the click on the View Instances button.
Once the instance has started up and enters a
running Instance State, you will install and configure the software required to create a condor_annex-compatible AMI on the instance. To begin, open a terminal and login into the instance via SSH.
[user@client ~]$ ssh -i ~/.ssh/HTCondorAnnex.pem centos@ANNEX.PUBLIC.IP
Then switch to root.
[centos@ANNEX-PRIVATE-IP ~]$ sudo -i
First, update the instance's base OS configuration.
[root@ANNEX-PRIVATE-IP ~]$ yum update
Then install the Extra Packages for Enterprise Linux (EPEL) repository.
[root@ANNEX-PRIVATE-IP ~]$ yum install epel-release
Next, install the yum priorities package
[root@ANNEX-PRIVATE-IP ~]$ yum install yum-plugin-priorities
and the appropriate
Open Science Grid (OSG) repositories.
[root@ANNEX-PRIVATE-IP ~]$ rpm -Uvh https://repo.grid.iu.edu/osg/3.3/osg-3.3-el6-release-latest.rpm
Once the OSG repositories are available on the instance,
install the CA certificates and fetch-crl.
[root@ANNEX-PRIVATE-IP ~]$ yum install osg-ca-certs
[root@ANNEX-PRIVATE-IP ~]$ yum install fetch-crl
Next,
install the OSG Worker Node Client.
[root@ANNEX-PRIVATE-IP ~]$ yum install osg-wn-client
After the client software is installed, manually create both a
condor group and user and then
install HTCondor.
[root@ANNEX-PRIVATE-IP ~]$ groupadd condor
[root@ANNEX-PRIVATE-IP ~]$ useradd condor -g condor
[root@ANNEX-PRIVATE-IP ~]$ yum install condor.x86_64
This would complete the typical software installation of a standard OSG HTCondor execute node, except for
CVMFS. However, in order to support condor_annex, several other software packages must be properly installed on the instance.
In addition to the standard OSG software, you must also install
cloud-init
[root@ANNEX-PRIVATE-IP ~]$ yum install cloud-init
and several other python packages, including
pip.
[root@ANNEX-PRIVATE-IP ~]$ yum install pystache
[root@ANNEX-PRIVATE-IP ~]$ yum install python-argparse
[root@ANNEX-PRIVATE-IP ~]$ yum install python-daemon
[root@ANNEX-PRIVATE-IP ~]$ yum install python-requests
[root@ANNEX-PRIVATE-IP ~]$ yum install python-pip
You may also want to make sure these packages are up-to-date.
[root@ANNEX-PRIVATE-IP ~]$ pip install --upgrade pip
Once these packages are installed,
install the AWS CLI.
[root@ANNEX-PRIVATE-IP ~]$ pip install awscli
and the
AWS CloudFormation Helper Scripts.
[root@ANNEX-PRIVATE-IP ~]$ easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
Several standard directories and symbolic links found on Amazon Linux AMIs must be created to successfully use the CloudFormation Helper Scripts on CentOS 6.
[root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-hup /etc/init.d/cfn-hup
[root@ANNEX-PRIVATE-IP ~]$ chmod 775 /usr/bin/cfn-hup
[root@ANNEX-PRIVATE-IP ~]$ mkdir /opt/aws
[root@ANNEX-PRIVATE-IP ~]$ mkdir /opt/aws/bin
[root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-hup /opt/aws/bin/cfn-hup
[root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-init /opt/aws/bin/cfn-init
[root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-signal /opt/aws/bin/cfn-signal
[root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-get-metadata /opt/aws/bin/cfn-get-metadata
If you are using a different base OS AMI, please see
this link for some possible changes to the CloudFormation Helper Script configuration.
Activate (or deactivate) the following services as indicated and then logout from the instance.
[root@ANNEX-PRIVATE-IP ~]$ chkconfig iptables off
[root@ANNEX-PRIVATE-IP ~]$ service iptables stop
[root@ANNEX-PRIVATE-IP ~]$ chkconfig fetch-crl-boot on
[root@ANNEX-PRIVATE-IP ~]$ chkconfig fetch-crl-cron on
[root@ANNEX-PRIVATE-IP ~]$ service fetch-crl-boot start
[root@ANNEX-PRIVATE-IP ~]$ service fetch-crl-cron start
[root@ANNEX-PRIVATE-IP ~]$ chkconfig condor on
[root@ANNEX-PRIVATE-IP ~]$ service condor start
[root@ANNEX-PRIVATE-IP ~]$ exit
Now that you have prepared a condor_annex-compatible AMI on this instance, you'll need to save it for future use on other instances. To do so:
- Return to your web browser and go to the EC2 console.
- In the navigation pane, under INSTANCES, choose Instances.
- There you will see a list of each individual instance available in the Region. Select the instance you've just configured your condor_annex-compatible AMI on.
- From the dropdown menu Actions, go to Image and select Create Image.
- You will be prompted to make changes to the AMI before its creation. You'll likely want to add an Image name and check the Delete on Termination box. Make any other adjustments you find necessary and then click on the Create Image button. This will create an AMI from your instance that can be used with condor_annex.
- Go ahead and Close the Create Image request received dialog box to return to the EC2 Dashboard.
- In the navigation pain, under IMAGES, click on AMIs. There you will see a list of the your custom AMIs, including the condor_annex-compatible AMI that was just created from your instance. Note the AMI ID for this image as it will be one of the required inputs when calling condor_annex.
Step 8: Configure HTCondor Pool for Password Authentication
condor_annex currently assumes that your local HTCondor pool allows daemon-to-daemon communication via Password Authentication. If your local pool is not yet configured to use a pool password, you must first
generate and store a password file on both the
SUBMIT node and
CENTRAL_MANAGER by running the following command on each:
[root@SUBMIT ~]$ condor_store_cred -c add
This command will prompt you to enter a pool password. Once entered, a password file will be stored on the local machine. By default, the password file created on each machine is /etc/condor/condor_pool_password. Make sure that you run this pool password command --- entering the same password --- on both the SUBMIT node and the CENTRAL_MANAGER of your local pool. You may also use Password Authentication with your local
EXECUTE nodes. However, this is not required by condor_annex. Only the
ANNEX instances require the use of Password Authentication.
Once your SUBMIT node and CENTRAL_MANAGER have the pool password file, you must configure their HTCondor daemons to use Password Authentication. On both the machines, login as root and go to the HTCondor config.d directory.
[root@CENTRAL_MANAGER ~]$ cd /etc/condor/config.d
In this directory, create the following HTCondor configuration file (99_condor_annex_passwd.config)
ALLOW_DAEMON = $(ALLOW_DAEMON), condor_pool@*
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION_METHODS = $(SEC_DEFAULT_AUTHENTICATION_METHODS), PASSWORD
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_INTEGRITY = REQUIRED
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
SEC_PASSWORD_FILE = /etc/condor/condor_pool_password
and then restart condor.
[root@CENTRAL_MANAGER ~]$ service condor restart
Your local HTCondor pool should now be ready to use Password Authentication with condor_annex.
Step 9. Install and configure AWS CLI
The
AWS Command Line Interface (CLI) is a tool to manage your AWS resources and services from the command line as well as automate your interaction with them via scripting. Remember, condor_annex itself is a Perl-based script that relies on the AWS CLI to automate the construction of an annex given the inputs provided by a user. As such, the AWS CLI must be installed and configured on any host that will run condor_annex.
If you plan to let your users run condor_annex for themselves when they need additional resources, then you should install the AWS CLI on your HTCondor pool's SUBMIT node. To install the AWS CLI, login as root to your SUBMIT node and run the following commands.
[root@SUBMIT ~]$ yum install python-pip
[root@SUBMIT ~]$ pip install awscli
Once the AWS CLI is installed, each user who wants to run condor_annex will have to configure the CLI using their AWS Security Credentials. To configure the CLI, they must run the following command and enter the requested information.
[user@SUBMIT ~]$ aws configure
AWS Access Key ID [None]: ****************4FSQ
AWS Secret Access Key [None]: ****************RbV6
Default region name [None]: us-east-1
Default output format [None]: json
For the
Default region name and
Default output format, please make sure to instruct your users to enter (1) the codename for the AWS Region that contains your pre-configured condor_annex-compatible AMI and (2) json, respectively. Once a user completes this AWS CLI configuration process, they should find the settings stored in the new .aws directory that has been created in their home directory.
Step 10. Install and configure condor annex
Now that the AWS CLI is installed and configured on you local pool's SUBMIT node, you can also install condor_annex on it. Since condor_annex is not currently distributed via RPMs, you will have to clone one of the git repositories where it is stored. As such, you must begin by installing git.
[root@SUBMIT ~]$ yum install git
Once git is installed, you should clone one of the repositories containing condor_annex into /opt.
[root@SUBMIT ~]$ cd /opt
The current development version of condor_annex from the HTCondor team is available in:
[root@SUBMIT ~]$ git clone https://github.com/htcondor/htcondor.git -b V8_5-condor_annex-branch
Note, however, this is the
entire HTCondor project's development branch for condor_annex. The only components necessary to run condor_annex are actually self-contained within the directory /htcondor/src/condor_annex. A forked repository that only contains these condor_annex components as well as a few minor modifications to them is also available at:
[root@SUBMIT ~]$ [root@SUBMIT ~]$ git clone https://github.com/mkandes/condor_annex.git
This repository is intended to remain more stable while the HTCondor team continues to develop condor_annex into a HTCondor daemon. It would be our recommendation to use this repository while evaluating condor_annex.
Finally, condor_annex requires the perl-JSON module. Don't forget to install it after you've cloned condor_annex from one of the repositories.
[root@SUBMIT ~]$ yum install perl-JSON
Step 11. Launch a condor_annex