CREAM Tests

Introduction

OSG is evaluating various Grid Compute Elements (aka Gatekeepers) to determine which ones it should support. Feature list, ease of use, performance and reliability are all important aspects of the evaluation.

CREAM is a Grid Compute Element developed in EUROPE by gLite:

Client tests

OSG relies hevily on Condor-G for Grid submissions, so it was used for the client testing.

Condor-G only added functional support for CREAM in v7.3.2, but one should use at least the next stable release series, 7.4.X. The v7.5.X development series has added additional improvements, so anyone looking for maximum performance should use that.

Condor-G also needs gridFTP and VOMS certs installed in order to talk to a CREAM CE.
The glideinWMS installer can be used for this purpose:

cvs -d :pserver:anonymous@cdcvs.fnal.gov:/cvs/cd_read_only co -r snapshot_100518_v2plus_Igor_CREAM glideinWMS 

Moreover, our group has collaborated with the Condor team for a long time to get to a release of Condor-G with CREAM support. Details can be found on the CREAM Support for CMS page.

Server tests

CREAM installation is only supported via RPMs.

Installation on osg-gw-3

Massimo Sgaravatto helpped with the installation, resulting in the following instructions:

Installation

(updated Apr 8th 2011)

Copied in /etc/yum.repos.d the following repos:
http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/dag.repo
http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/glite-CREAM.repo
http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo

Of course also the OS repo files are needed as well


yum clean all
yum update
yum install ca-policy-egi-core
yum install java-1.6.0-openjdk
yum install xml-commons-apis
yum install glite-CREAM


Installed/updated these RPMs (this won't be needed anymore when glite-CONDOR_utils is released):
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/glite-info-dynamic-scheduler-condor/1.0.0/noarch/glite-info-dynamic-scheduler-condor-1.0.0-1.noarch.rpm
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/glite-yaim-condor-utils/5.1.0/noarch/glite-yaim-condor-utils-5.1.0-1.noarch.rpm
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/org.glite.apel.condor/2.0.6/slc4_ia32_gcc346/glite-apel-condor-2.0.6-2.noarch.rpm

PS: The official instructions are located at http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32 but do not include how to install the Condor part.

Configuration and customizations:

Customized the conf files (/root/SiteInfo/site-info.def and
/root/SiteInfo/services/glite-creamce)
siteinfo.def file (/opt/glite/yaim/examples/siteinfo/site-info.def)

(attached is the siteinfo used on osg-gw-3)

(updated April 8th 2011: Add CONDOR_GROUP_ENABLE=False to the end of site-info.def)

Run yaim:

/opt/glite/yaim/bin/yaim -c -s /root/SiteInfo/site-info.def -n creamCE -n CONDOR_utils



Because in the WN the proper environment is not already defined, it was needed
to customize the CREAM JobWrapper? .
This was done following the instructions reported at:

http://grid.pd.infn.it/cream/field.php?n=Main.HowToCustomizeTheCREAMJobWrapper

("Instructions for CREAM CE >= 1.6 (glite-ce-cream >= 1.12)" section)

adding the following 2 lines:

export OSG_GRID=/code/osgcode/wn-client
. $OSG_GRID/setup.sh

after:

for((idx=0; idx<${#__environment[*]}; idx++)); do
eval export ${__environment[$idx]}
done


Because the UCSD Condor installation requires a special arguments setting
in the Condor submit file, /opt/glite/bin/condor_submit.sh was mofified changing:

arguments = $arguments

into:

arguments = -wrapper_iwd $_CONDOR_SCRATCH_DIR $arguments

Service startup

- service tomcat5 stop
- /opt/glite/etc/init.d/glite-ce-blahparser stop
- service tomcat5 start

Client configuration

Condor 7.5.0 was used to run against CREAM.

Had to install a GridFTP? server with appropriate grid-mapfile.

The condor submit file had the following lines in it:

Universe = grid
grid_resource = cream https://osg-gw-3.t2.ucsd.edu:8443/ce-cream/services/CREAM2 condor osg-gw-3

The CREAM client in Condor-G needs valid VOMS CA pub keys; i.e. the vomsdir must be populated.

During the test, the load on the client became very high:
top - 21:08:42 up 310 days, 9:30, 1 user, load average: 41.64, 40.04, 38.89
Likely due to all the gridFTP sessios that were calling back.

Test run

A test on glidein-c against osg-gw-3 (on 2010/02/19):

  • Ran 10k 30 minute jobs against CREAM on osg-gw-3, using Condor-G v7.5.0 on glidein-c
  • CREAM finished in about 7 hours; for comparison, a similar GT2 run took about 9 hours.
  • Condor-G numbers are reality close to the CE ones under CREAM
    • Under CREAM Condor-G was almost always reporting the same numbers as the CE
    • For comparison, under GT2 Condor-G thought jobs were running almost 5 hours after they finished on the CE.
  • The test run ran only ~8k jobs; ~2k jobs got held
    • ~1.4k failed while staging in the input sandbox
    • ~600 failed while staging out the output sandbox
    • For comparison, under GT2 I observed no held jobs
  • CREAM jobs took much longer than 30min to complete
    • Under CREAM, over half took more than 90mins, with a non negligible fraction taking over 3 minutes
      This may be related to the heavy load on the Condor-G node, due to I/O handled by the gridFTP server
    • For comparison, under GT2 all jobs finished withing 31 minutes
  • For detailed results, see cream_10k.ods.
    cream_abs.png
    cream_30min_job_spread.png

OSG use of CREAM

OSG is planning in officially suporting CREAM in the near term.

The first step in the process is represented by this planning document.

-- IgorSfiligoi - 2009/11/03

Topic attachments
I Attachment Action Size DateSorted ascending Who Comment
pngpng cream_10k_abs.png manage 45.4 K 2010/02/20 - 05:49 IgorSfiligoi Abs values of the 10k run
pngpng cream_30min_job_spread.png manage 27.2 K 2010/02/20 - 05:58 IgorSfiligoi Time spread of the 30min jobs
elseods cream_10k.ods manage 59.5 K 2010/02/20 - 06:10 IgorSfiligoi Running 10k jobs against CREAM (and compared to GT2) - UCSD LAN - Condor 7.5.0
ziptgz SiteInfo_gw3.tgz manage 7.6 K 2010/05/06 - 17:08 IgorSfiligoi SiteInfo? used on the test machine (osg-gw-3)
Topic revision: r13 - 2011/04/08 - 20:36:26 - IgorSfiligoi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback