TWiki> UCSDTier2 Web>CREAMTest (revision 10)EditAttach

CREAM Tests

Introduction

OSG is evaluating various Grid Compute Elements (aka Gatekeepers) to determine which ones it should support. Feature list, ease of use, performance and reliability are all important aspects of the evaluation.

CREAM is a Grid Compute Element developed in EUROPE by gLite:

Client tests

OSG relies hevily on Condor-G for Grid submissions, so it was used for the client testing.

Condor-G only added functional support for CREAM in v7.3.2, but one should use at least the next stable release series, 7.4.X.

The v7.5.X development series has added additional improvements, so anyone looking for maximum performance should use that.

Our group has however collaborated with the Condor team for a long time to get to a release of Condor-G with CREAM support. Details can be found on the CREAM Support for CMS page.

Server tests

CREAM installation is only supported via RPMs.

Installation on osg-gw-3

Massimo Sgaravatto helpped with the installation, resulting in the following instructions:

Installation


Copied in /etc/yum.repos.d the following repos:
http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/dag.repo
http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/lcg-CA.repo
http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/glite-CREAM.repo


Of course also the OS repo files are needed as well


yum clean all
yum update
yum install java-1.6.0-openjdk tomcat5
yum install lcg-CA
yum install xml-commons-apis
yum install glite-CREAM


Installed/updated these RPMs (this won't be needed anymore when glite-CONDOR_utils is released):
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/glite-info-dynamic-scheduler-condor/1.0.0/noarch/glite-info-dynamic-scheduler-condor-1.0.0-1.noarch.rpm
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/glite-yaim-condor-utils/5.1.0/noarch/glite-yaim-condor-utils-5.1.0-1.noarch.rpm
http://eticssoft.web.cern.ch/eticssoft/repository/org.glite/org.glite.apel.condor/2.0.6/slc4_ia32_gcc346/glite-apel-condor-2.0.6-2.noarch.rpm

Configuration and customizations:

Customized the conf files (/root/SiteInfo/site-info.def and
/root/SiteInfo/services/glite-creamce)
siteinfo.def file (/opt/glite/yaim/examples/siteinfo/site-info.def)

(attached is the siteinfo used on osg-gw-3)

Run yaim:

/opt/glite/yaim/bin/yaim -c -s /root/SiteInfo/site-info.def -n creamCE -n CONDOR_utils



Because in the WN the proper environment is not already defined, it was needed
to customize the CREAM JobWrapper? .
This was done following the instructions reported at:

http://grid.pd.infn.it/cream/field.php?n=Main.HowToCustomizeTheCREAMJobWrapper

("Instructions for CREAM CE >= 1.6 (glite-ce-cream >= 1.12)" section)

adding the following 2 lines:

export OSG_GRID=/code/osgcode/wn-client
. $OSG_GRID/setup.sh

after:

for((idx=0; idx<${#__environment[*]}; idx++)); do
eval export ${__environment[$idx]}
done


Because the UCSD Condor installation requires a special arguments setting
in the Condor submit file, /opt/glite/bin/condor_submit.sh was mofified changing:

arguments = $arguments

into:

arguments = -wrapper_iwd $_CONDOR_SCRATCH_DIR $arguments

Service startup

- service tomcat5 stop
- /opt/glite/etc/init.d/glite-ce-blahparser stop
- service tomcat5 start

Client configuration

Condor 7.5.0 was used to run against CREAM.

Had to install a GridFTP? server with appropriate grid-mapfile.

The condor submit file had the following lines in it:

Universe = grid
grid_resource = cream https://osg-gw-3.t2.ucsd.edu:8443/ce-cream/services/CREAM2 condor osg-gw-3

The CREAM client in Condor-G needs valid VOMS CA pub keys; i.e. the vomsdir must be populated.

During the test, the load on the client became very high:
top - 21:08:42 up 310 days, 9:30, 1 user, load average: 41.64, 40.04, 38.89
Likely due to all the gridFTP sessios that were calling back.

Test run

A test on glidein-c against osg-gw-3 (on 2010/02/19):

  • Ran 10k 30 minute jobs against CREAM on osg-gw-3, using Condor-G v7.5.0 on glidein-c
  • CREAM finished in about 7 hours; for comparison, a similar GT2 run took about 9 hours.
  • Condor-G numbers are reality close to the CE ones under CREAM
    • Under CREAM Condor-G was almost always reporting the same numbers as the CE
    • For comparison, under GT2 Condor-G thought jobs were running almost 5 hours after they finished on the CE.
  • The test run ran only ~8k jobs; ~2k jobs got held
    • ~1.4k failed while staging in the input sandbox
    • ~600 failed while staging out the output sandbox
    • For comparison, under GT2 I observed no held jobs
  • CREAM jobs took much longer than 30min to complete
    • Under CREAM, over half took more than 90mins, with a non negligible fraction taking over 3 minutes
      This may be related to the heavy load on the Condor-G node, due to I/O handled by the gridFTP server
    • For comparison, under GT2 all jobs finished withing 31 minutes
  • For detailed results, see cream_10k.ods.
    cream_abs.png
    cream_30min_job_spread.png

-- IgorSfiligoi - 2009/11/03

Topic attachments
I Attachment Action Size Date Who Comment
ziptgz SiteInfo_gw3.tgz manage 7.6 K 2010/05/06 - 17:08 IgorSfiligoi SiteInfo? used on the test machine (osg-gw-3)
elseods cream_10k.ods manage 59.5 K 2010/02/20 - 06:10 IgorSfiligoi Running 10k jobs against CREAM (and compared to GT2) - UCSD LAN - Condor 7.5.0
pngpng cream_10k_abs.png manage 45.4 K 2010/02/20 - 05:49 IgorSfiligoi Abs values of the 10k run
pngpng cream_30min_job_spread.png manage 27.2 K 2010/02/20 - 05:58 IgorSfiligoi Time spread of the 30min jobs
Edit | Attach | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2010/05/06 - 17:10:06 - IgorSfiligoi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback