Install and Configure condor_annex

About this Document

condor_annex is a Perl-based script that utilizes the Amazon Web Services (AWS) command-line interface (CLI) and other AWS services to orchestrate the delivery of HTCondor execute nodes running on AWS Elastic Compute Cloud (EC2) instances to an HTCondor pool. This document describes how to install, configure, and run condor_annex successfully from your own local HTCondor pool.

This document follows the general Open Science Grid (OSG) documentation conventions:

  1. A User Command Line is illustrated by a green box that displays a prompt:
     [user@client ~]$ 
  2. A Root Command Line is illustrated by a red box that displays the root prompt:
     [root@client ~]$ 
  3. Lines in a file are illustrated by a yellow box that displays the desired lines in a file:
     priorities=1 

Definitions

  • SUBMIT is the hostname of an HTCondor submit node, where users submit their jobs to your local pool.
  • CENTRAL_MANAGER is the hostname of your HTCondor central manager, where job and machine class ads are matched.
  • EXECUTE is the hostname of an HTCondor execute node in your local pool.
  • ANNEX is the hostname of an EC2 instance configured as a condor_annex execute node.

Requirements

  • An HTCondor pool
  • An Amazon Web Services Account

Step 1: Install and Configure an HTCondor Pool

If you do not already have your own HTCondor Pool, you may want to first start by installing your own personal HTCondor pool to experiment with condor_annex. Please consult the HTCondor Manual and/or Wiki for more information.

Step 2: Obtain an Amazon Web Services Account

In order to use condor_annex, you must already have an AWS account. You may establish an AWS account under the UC-wide agreement by following the instructions provided by Blink.

Step 3: Obtain AWS Account Credentials

condor_annex issues programmatic requests to AWS services via the AWS command-line interface (CLI). In order to issue these requests, the AWS CLI must sign them using your AWS account credentials. These credentials consist of an Access Key ID and a Secret Access Key. If you do not have these access keys, you may create them using the AWS Management Console. AWS recommends that you use Identity and Access Management (IAM) access keys instead of your root account access keys.

To create access keys, you must have permissions to perform the required IAM actions.

  1. Open the IAM console.
  2. In the navigation pane, choose Users.
  3. If you do not already have an IAM username, then select Add User. Each new user is issued credentials.
  4. If you already have an IAM username, then choose your IAM username (not the check box).
  5. Next, select the Security Credentials tab and then choose Create Access Key.
  6. Your credentials will look something like this:
    • Access Key ID: AKIAIOSFODNN7EXAMPLE
    • Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  7. Choose Download .csv file, and store the keys in a secure location. Your secret key will no longer be available through the AWS Management Console; you will have the only copy. Keep it confidential in order to protect your account, and never email it. Do not share it outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your secret key.

Save your Access Key ID and Secret Access Key. You will need to provide them later when configuring the AWS CLI. If you need more information about AWS Security Credentials, please consult the AWS documentation.

Step 4: Select a Region for the Annex

Amazon Elastic Compute Cloud (EC2) instances are hosted in multiple locations world-wide. These locations are composed of Regions and Availability Zones. Each Region is a separate geographic area. However, each Region also has multiple, isolated locations known as Availability Zones (AZs). However, not all AWS Regions are created equal. Each Region may offer only a subset of AWS services. You can find out what services are offered in each Region from the table provided here.

When selecting a Region for your annex, you must select a region that offers all of the AWS services required by condor_annex to function properly. These services are:

AWS Lambda currently has the most limited deployment of any AWS service required by condor_annex. For example, AWS Lambda is only available in the following Regions within the United States at this time:

  • Northern Virginia (us-east-1)
  • Ohio (us-east-2)
  • Oregon (us-west-2)

Select your desired Region accordingly from the drop-down menu in the upper-right-hand side of the AWS Management Console.

Step 5: Generate an EC2 Key Pair

After selecting a Region for your annex, you will need to generate an SSH key pair that will allow you to login to your EC2 instances in that Region. You can create a key pair using the EC2 console or the command line. You will specify this key pair when launching your instances with condor_annex.

To create your key pair using the Amazon EC2 console

  1. Open the EC2 console.
  2. In the navigation pane, under NETWORK & SECURITY, choose Key Pairs.
  3. Choose Create Key Pair.
  4. Enter a name for the new key pair in the Key pair name field of the Create Key Pair dialog box, and then choose Create.
  5. The private key file is automatically downloaded by your browser. The base file name is the name you specified as the name of your key pair, and the file name extension is .pem. Save the private key file in a safe place. This is the only chance for you to save the private key file. You'll need to provide the name of your key pair when you launch an instance and the corresponding private key each time you connect to the instance.
  6. Use the following command to set the permissions of your private key file so that only you can read it.
     [user@SUBMIT ~]$ chmod 400 my-key-pair.pem 

If you would like to create your SSH key pair using the AWS CLI or import your own key pair, please consult the AWS documentation.

Step 6: Configure Default VPC Security Group

condor_annex will automatically create and configure an AWS Security Group (i.e., a virtual firewall) around all of the instances within an annex. However, depending on your HTCondor pool configuration, it may also been useful to place some on-demand resources in AWS. For example, you may want to a separate HTCondor central manager instance located in AWS in order to flock user jobs over to the annex instead of connecting the annex instances back to your local central manager.

Any such on-demand resources may be placed in your AWS Region's default Virtual Private Cloud (VPC) Security Group. To configure the default VPC Security Group:

  1. Open the VPC console.
  2. In the navigation pane, under Security, choose Security Groups.
  3. Select the Security Group in the list that has Group Name default and Description default VPC security group.
  4. Next, select the Inbound Rules tab and then click on the Edit button.

By default, the only inbound rule should be one allowing all traffic from instances assigned to the default VPC Security Group.

Type Protocol Port Range Source
ALL Traffic ALL ALL The security group ID (sg-xxxxxxxx)

We recommend the following set of inbound rules be used for the default VPC Security Group:

  1. Keep the default rule allowing all traffic from instances assigned to the default VPC Security Group.
  2. Allow all inbound traffic from instances within your AWS Region's default VPC's private network IP address space.
  3. Allow inbound SSH traffic on port 22.
  4. Allow inbound ICMP traffic.
  5. Allow inbound HTCondor UDP traffic on port 9618
  6. Allow inbound HTCondor TCP traffic on port 9618.

In their most permissive form, these inbound rules for the default VPC security group will look something like this:

Type Protocol Port Range Source
All traffic All All sg-5437332d (default)
All traffic All All 172.31.0.0/16
SSH TCP 22 0.0.0.0/0
All ICMP All N/A 0.0.0.0/0
Custom UDP Rule UDP 9618 0.0.0.0/0
Custom TCP Rule TCP 9618 0.0.0.0/0

Of course, you should try to restrict the Source IP address space for these rules as much as possible. For example, you may want to limit them to inbound traffic from your home institution's public IP address space.

By default, each Security Group, including the default VPC Security Group, allows ALL outbound traffic.

Type Protocol Port Range Source
ALL Traffic ALL ALL 0.0.0.0/0

If you would like to restrict outbound traffic from the default VPC Security Group, select the Outbound Rules tab, click on the Edit button, and then configure the outbound rules accordingly.

Step 7: Create an condor_annex-compatible Amazon Machine Image

Each HTCondor execute instance within your annex must run a condor_annex-compatible Amazon Machine Image (AMI). By default, condor_annex will attempt to use one of the publicly available Amazon Linux AMIs with HTCondor 8.4.2 pre-installed currently provided by the HTCondor team. These condor_annex-compatible AMIs are available in the following AWS Regions within the United States:

RegionSorted ascending AMI ID
us-east-1 ami-91e1a3fb
us-west-1 ami-7f06731f
us-west-2 ami-ac8890cd

If these preconfigured AMIs cannot be successfully modified to suit your needs, you will need to create your own condor_annex-compatible AMI. We have done so for our own purposes by building a condor_annex-compatible CentOS 6-based AMI.

To build your own condor_annex-compatible AMI, open the Elastic Compute Cloud (EC2) dashboard in the Region where you will run your annex. Click on the Launch Instance button. This will open the instance launch configuration wizard. Follow these steps.

  1. Choose an Amazon Machine Image (AMI): We configured our annex's execute instances to use CentOS 6. To find a suitable CentOS 6 AMI to start from, select the AWS Marketplace tab and then enter "CentOS 6" in the search box. Your search will return multiple results. However, the most up-to-date AMI should be the first one in the list. Unless you have special requirements for your configuration, select this AMI by clicking on the Select button.
  2. Choose an Instance Type: Once you have selected an AMI, the launch configuration wizard will prompt you to select an instance type on which to build your condor_annex execute node. Choose one that suits your needs. Once you have selected your instance type, click on the Next: Configure Instance Details button.
  3. Configure Instance Details: Only one instance is required to configure your condor_annex-compatible AMI. Therefore, you may leave the Number of Instances at 1. Next, select one of your Network VPCs. In general, you should choose the default VPC whose Security Group was pre-configured in the previous step. Once you have determined which VPC will host this instance, select a specific Subnet in which to place it. The other networking options Auto-assign Public IP and Placement group may be left set to their default settings of Use subnet setting (Enabled) and No placement group, respectively. After configuring the networking details, if you would like to apply a specific IAM role to the instance, then select an appropriate role for it. Otherwise, leave IAM role set to its default value of None. All other instance details may be configured with their default values. Once you have completed configuring your instance details, click on the Next: Add Storage button.
  4. Add Storage: In general, you will not have to modify the configuration of your root storage volume for the instance. However, the launch wizard may still default to a Magnetic volume type, even though the General Purpose SSD option is now becoming AWS' recommended default. Our instance launch wizard still defaults to Magnetic. As such, we changed our root Volume Type from an 8GiB Magnetic volume to an 8 GiB General Purpose SSD volume and selected Delete on Termination. Once you have completed the configuration of your root volume, click on the Next: Tag Instance button.
  5. Tag Instance: Add a Name to your instance and then click on the Next: Configure Security Group button.
  6. Configure Security Group: Select an existing security group and choose your default VPC security group. Once you have selected a security group, click on the Review and Launch button.
  7. Review Instance Launch: Review the configuration of your instance and make any necessary changes. Once done, click on the Launch button. You will be prompted to Select an existing key pair or create a new key pair, which will enable your SSH access to the instance. Select one of your existing key pairs (or create a new one) and then agree to the acknowledgement statement by clicking on the checkbox next to it. Once you have selected your key pair, click on the Launch Instances button. After launching your instance, the wizard will display the Launch Status page. To return to the main EC2 dashboard, scroll down the click on the View Instances button.

Once the instance has started up and enters a running Instance State, you will install and configure the software required to create a condor_annex-compatible AMI on the instance. To begin, open a terminal and login into the instance via SSH.

 [user@client ~]$ ssh -i ~/.ssh/HTCondorAnnex.pem centos@ANNEX.PUBLIC.IP 

Then switch to root.

 [centos@ANNEX-PRIVATE-IP ~]$ sudo -i 

First, update the instance's base OS configuration.

 [root@ANNEX-PRIVATE-IP ~]$ yum update

Then install the Extra Packages for Enterprise Linux (EPEL) repository.

 [root@ANNEX-PRIVATE-IP ~]$ yum install epel-release

Next, install the yum priorities package

 [root@ANNEX-PRIVATE-IP ~]$ yum install yum-plugin-priorities 

and the appropriate Open Science Grid (OSG) repositories.

 [root@ANNEX-PRIVATE-IP ~]$ rpm -Uvh https://repo.grid.iu.edu/osg/3.3/osg-3.3-el6-release-latest.rpm

Once the OSG repositories are available on the instance, install the CA certificates and fetch-crl.

 [root@ANNEX-PRIVATE-IP ~]$ yum install osg-ca-certs
 [root@ANNEX-PRIVATE-IP ~]$ yum install fetch-crl 

Next, install the OSG Worker Node Client.

 [root@ANNEX-PRIVATE-IP ~]$ yum install osg-wn-client 

After the client software is installed, manually create both a condor group and user and then install HTCondor.

 [root@ANNEX-PRIVATE-IP ~]$ groupadd condor
 [root@ANNEX-PRIVATE-IP ~]$ useradd condor -g condor
 [root@ANNEX-PRIVATE-IP ~]$ yum install condor.x86_64 

This would complete the typical software installation of a standard OSG HTCondor execute node, except for CVMFS. However, in order to support condor_annex, several other software packages must be properly installed on the instance.

In addition to the standard OSG software, you must also install cloud-init

 [root@ANNEX-PRIVATE-IP ~]$ yum install cloud-init

and several other python packages, including pip.

 [root@ANNEX-PRIVATE-IP ~]$ yum install pystache
 [root@ANNEX-PRIVATE-IP ~]$ yum install python-argparse
 [root@ANNEX-PRIVATE-IP ~]$ yum install python-daemon
 [root@ANNEX-PRIVATE-IP ~]$ yum install python-requests 
 [root@ANNEX-PRIVATE-IP ~]$ yum install python-pip 

You may also want to make sure these packages are up-to-date.

 [root@ANNEX-PRIVATE-IP ~]$ pip install --upgrade pip 

Once these packages are installed, install the AWS CLI.

 [root@ANNEX-PRIVATE-IP ~]$ pip install awscli 

and the AWS CloudFormation Helper Scripts.

 [root@ANNEX-PRIVATE-IP ~]$ easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz 

Several standard directories and symbolic links found on Amazon Linux AMIs must be created to successfully use the CloudFormation Helper Scripts on CentOS 6.

 [root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-hup /etc/init.d/cfn-hup
 [root@ANNEX-PRIVATE-IP ~]$ chmod 775 /usr/bin/cfn-hup
 [root@ANNEX-PRIVATE-IP ~]$ mkdir /opt/aws
 [root@ANNEX-PRIVATE-IP ~]$ mkdir /opt/aws/bin
 [root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-hup /opt/aws/bin/cfn-hup
 [root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-init /opt/aws/bin/cfn-init
 [root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-signal /opt/aws/bin/cfn-signal
 [root@ANNEX-PRIVATE-IP ~]$ ln -s /usr/bin/cfn-get-metadata /opt/aws/bin/cfn-get-metadata 

If you are using a different base OS AMI, please see this link for some possible changes to the CloudFormation Helper Script configuration.

Activate (or deactivate) the following services as indicated and then logout from the instance.

 [root@ANNEX-PRIVATE-IP ~]$ chkconfig iptables off
 [root@ANNEX-PRIVATE-IP ~]$ service iptables stop
 [root@ANNEX-PRIVATE-IP ~]$ chkconfig fetch-crl-boot on
 [root@ANNEX-PRIVATE-IP ~]$ chkconfig fetch-crl-cron on
 [root@ANNEX-PRIVATE-IP ~]$ service fetch-crl-boot start
 [root@ANNEX-PRIVATE-IP ~]$ service fetch-crl-cron start
 [root@ANNEX-PRIVATE-IP ~]$ chkconfig condor on
 [root@ANNEX-PRIVATE-IP ~]$ service condor start
 [root@ANNEX-PRIVATE-IP ~]$ exit 

Now that you have prepared a condor_annex-compatible AMI on this instance, you'll need to save it for future use on other instances. To do so:

  1. Return to your web browser and go to the EC2 console.
  2. In the navigation pane, under INSTANCES, choose Instances.
  3. There you will see a list of each individual instance available in the Region. Select the instance you've just configured your condor_annex-compatible AMI on.
  4. From the dropdown menu Actions, go to Image and select Create Image.
  5. You will be prompted to make changes to the AMI before its creation. You'll likely want to add an Image name and check the Delete on Termination box. Make any other adjustments you find necessary and then click on the Create Image button. This will create an AMI from your instance that can be used with condor_annex.
  6. Go ahead and Close the Create Image request received dialog box to return to the EC2 Dashboard.
  7. In the navigation pain, under IMAGES, click on AMIs. There you will see a list of the your custom AMIs, including the condor_annex-compatible AMI that was just created from your instance. Note the AMI ID for this image as it will be one of the required inputs when calling condor_annex.

Step 8: Configure HTCondor Pool for Password Authentication

condor_annex currently assumes that your local HTCondor pool allows daemon-to-daemon communication via Password Authentication. If your local pool is not yet configured to use a pool password, you must first generate and store a password file on both the SUBMIT node and CENTRAL_MANAGER by running the following command on each:

 [root@SUBMIT ~]$ condor_store_cred -c add 

This command will prompt you to enter a pool password. Once entered, a password file will be stored on the local machine. By default, the password file created on each machine is /etc/condor/condor_pool_password. Make sure that you run this pool password command --- entering the same password --- on both the SUBMIT node and the CENTRAL_MANAGER of your local pool. You may also use Password Authentication with your local EXECUTE nodes. However, this is not required by condor_annex. Only the ANNEX instances require the use of Password Authentication.

Once your SUBMIT node and CENTRAL_MANAGER have the pool password file, you must configure their HTCondor daemons to use Password Authentication. On both the machines, login as root and go to the HTCondor config.d directory.

 [root@CENTRAL_MANAGER ~]$ cd /etc/condor/config.d 

In this directory, create the following HTCondor configuration file (99_condor_annex_passwd.config)

 ALLOW_DAEMON = $(ALLOW_DAEMON), condor_pool@*
 SEC_CLIENT_AUTHENTICATION = REQUIRED
 SEC_CLIENT_AUTHENTICATION_METHODS = $(SEC_CLIENT_AUTHENTICATION_METHODS), PASSWORD
 SEC_CLIENT_ENCRYPTION = OPTIONAL
 SEC_CLIENT_INTEGRITY = REQUIRED
 SEC_DAEMON_AUTHENTICATION = REQUIRED
 SEC_DAEMON_AUTHENTICATION_METHODS = $(SEC_DAEMON_AUTHENTICATION_METHODS), PASSWORD
 SEC_DAEMON_ENCRYPTION = OPTIONAL
 SEC_DAEMON_INTEGRITY = REQUIRED
 SEC_NEGOTIATOR_AUTHENTICATION = REQUIRED
 SEC_NEGOTIATOR_AUTHENTICATION_METHODS = $(SEC_NEGOTIATOR_AUTHENTICATION_METHODS), PASSWORD
 SEC_NEGOTIATOR_ENCRYPTION = OPTIONAL
 SEC_NEGOTIATOR_INTEGRITY = REQUIRED
 SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
 SEC_PASSWORD_FILE = /etc/condor/condor_pool_password 

and then restart condor.

 [root@CENTRAL_MANAGER ~]$ service condor restart 

Your local HTCondor pool should now be ready to use Password Authentication with condor_annex.

Step 9. Install and configure AWS CLI

The AWS Command Line Interface (CLI) is a tool to manage your AWS services from the command line and automate your interaction with them via scripting. Remember, condor_annex itself is a Perl-based script that relies on the AWS CLI to automate the construction of an annex given the inputs provided by a user. As such, the AWS CLI must be installed and configured on any host that will run condor_annex.

If you plan to let your users run condor_annex for themselves when they need additional resources, then you should install the AWS CLI on your HTCondor pool's SUBMIT node. To install the AWS CLI, login as root to your SUBMIT node and run the following commands.

 [root@SUBMIT ~]$ yum install python-pip
 [root@SUBMIT ~]$ pip install awscli 

Once the AWS CLI is installed, each user who wants to run condor_annex will have to configure the CLI using their AWS Security Credentials. To configure the CLI, they must run the following command and enter the requested information.

 [user@SUBMIT ~]$ aws configure
 AWS Access Key ID [None]: ****************4FSQ
 AWS Secret Access Key [None]: ****************RbV6
 Default region name [None]: us-east-1
 Default output format [None]: json

For the Default region name and Default output format, please make sure to instruct your users to enter (1) the codename for the AWS Region that contains your pre-configured condor_annex-compatible AMI and (2) json, respectively. Once a user completes this AWS CLI configuration process, they should find the settings stored in the new .aws directory that has been created in their home directory.

Step 10. Install and configure condor annex

yum install git yum install perl-JSON cd /opt git clone https://github.com/mkandes/condor_annex.git

Step 11. Launch a condor_annex


This topic: UCSDTier2 > WebHome > Condor_annex
Topic revision: r13 - 2016/11/21 - 22:26:13 - MartinKandes
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback