GRAM5 tests

This page is meant to contain the information we gather while testing GRAM5.

GRAM5 validation tests

Jeff and Igor did a first round of tests to check if GRAM5 (alpha2) indeed worked and how it compared to GT2 on a small scale.

GRAM 5 alpha benchmarking

GRAM5 alpha2 was installed on osg-gw-5.t2.ucsd.edu.

  • File transfer is not working properly. All further tests are without input or output files.
  • Submitting from a machine on a local LAN, 1Hz (50 jobs/min) job turnaround is easy to acchieve. Peaks of 3Hz (170 jobs/min) has also been observed.
    If monitoring is not needed, 3Hz (200 jobs/min) sustained rate should be acchievable.
    For more details: rates_gt5a2_ucsd.pdf
  • Submitting from a machine on the other side of the world, the job turnaround is about 17 jobs/min. Peaks of 30 jobs/min has also been observed.
    If monitoring is not needed, 200 jobs/min (3Hz) sustained rate should be acchievable.
    For more details: rates_gt5a2_ucsd_r2.pdf
    Note: No clear conclusions, yet... I have found that GRAM5 gets slower in time, so the results may be affected by that.

GRAM 5 beta 1

Installation

While up to alpha3, GRAM5 played well on top of a previous installation of a 1.10.1 VDT, beta1 needs some changes:

  • The initial install used the VDT perl, and this created some problems. Make sure you use the system perl.
    Configured in globus.v5b1/libexec/globus-sh-tools-vars.sh
  • jobmanager-fork is not the default anymore; one needs to change the symlink in globus/etc/grid-services

More changes:

  • copied VDT $GLOBUS_LOCATION/etc/gridftp.cong
  • created $GLOBUS_LOCATION/var/log (used by gridFTP)

OSG jobmanager-condor

OSG uses a patched/augmented version jobmanager-condor (condor.in->condor.pm). The jobmanager-condor changed slightly between GT2 and GRAM5, so we had to merge the two.

The work was performed by Christopher Theissen, and is documented at here.

GRAM5 beta2

Just installed GRAM2 beta2 on Nov 25th.

Steps:

  • copy over etc/gridftp.conf
  • create var/log
  • patch lib/perl/Globus/GRAM/JobManager/condor.pm, to enable file transfer
    298,299c298,299
    < #$requirements = "OpSys = \"" . $description->condor_os() . "\" ";
    < #$requirements .
    " && Arch = \"" . $description->condor_arch() . "\" ";
    ---
    > $requirements = "OpSys \"" . $description->condor_os() . "\" ";
    > $requirements .
    " && Arch == \"" . $description->condor_arch() . "\" ";
    318,347d317
    <
    < ####################
    < # add input files
    < my @flist;
    <
    < my $sdir;
    < opendir($sdir,$description->directory());
    < my @sfiles = grep { !/^\./} readdir($sdir);
    < close $sdir;
    <
    < foreach $f ( @sfiles ) {
    < my $fpath = $description->directory() . "/" . $f;
    < if (-d $fpath) {
    < # do nothing
    < } else {
    < my $age = -M $fpath;
    < { #if ( $age < $0.01 ) {
    < # protection for when the dir is shared with other jobs
    < # get only recent files
    < $fpath =~ s{\/\/}{\/}g;
    < push (@flist,"$fpath");
    < }
    < }
    < }
    <
    < print SCRIPT_FILE "transfer_input_files = " . join(",",@flist) . "\n";
    < print SCRIPT_FILE "should_transfer_files = YES\n";
    < print SCRIPT_FILE "when_to_transfer_output = ON_EXIT\n";
    < ###################
    <
  • add -seg-module condor to etc/grid-services/jobmanager-condor
  • Start
    $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator \
    -scheduler $lrm \
    -background \
    -pidfile $GLOBUS_LOCATION/var/seg-$lrm.pid

GRAM 5 RC benchmarking

GRAM5 rc was installed on osg-gw-5.t2.ucsd.edu.

  • No major problems found anymore... Condor-G timed out on a few jobs putting them on hold... likely a Condor-G problem
  • The benchmarking seemed limited by the submit machine CPU (uaf-1); the jobs were submittied to GRAM5 as fast as jobs were being submitted to the client's schedd
  • Rates of up to ~2Hz (100 jobs/min) were observed
  • Detailed numbers can be found at rates_gram5rc_ucsd.pdf

A second test on the same installation, with direct comparison with GT2 (on 2010/02/10):

  • Ran 10k 30 minute jobs against both GT2 and GRAM5 on osg-gw-5, using Condor-G v7.4.1 on uaf-1
  • GRAM5 finished in about 6 hours, GT2 took about 9 hours.
  • Condor-G numbers were much closer to reality under GRAM5 than under GT2.
    • Under GRAM5 Condor-G noticed the termination of the last file close to the 6 hour mark, although it was not always that close during the run.
    • Under GT2, Condor-G thought jobs were running almost 14 hours.
  • For detailed results, see bench_30min_10k.ods.
    bench_30min_10k_abs.png
    bench_30min_10k_rates.png

Backward compatibility, using Condor-G's gt2 mode:

  • everything worked, but only 10 jobs at a time were in the CE queue
  • enough to allow casual use ofa GRAM5 CE, but condor-G 7.4+with its gt5 mode is NEEDED to useon large scale

OSG use of GRAM5

OSG is planning in officially suporting GRAM5 in the near term.

The first step in the process is represented by this planning document.

-- IgorSfiligoi - 2009/08/12

Topic attachments
I Attachment Action Size Date Who CommentSorted ascending
pngpng bench_30min_10k_abs.png manage 54.4 K 2010/02/11 - 19:52 IgorSfiligoi Abs values of the 10k run
elseodt rates_gt5a2_ucsd_r2.odt manage 83.7 K 2009/08/25 - 21:50 IgorSfiligoi GRAM5 alpha2 benchmarks - Italy to UCSD
pdfpdf rates_gt5a2_ucsd_r2.pdf manage 529.6 K 2009/08/25 - 21:49 IgorSfiligoi GRAM5 alpha2 benchmarks - Italy to UCSD
elseodt rates_gt5a2_ucsd.odt manage 82.7 K 2009/08/24 - 19:39 IgorSfiligoi GRAM5 alpha2 benchmarks - UCSD LAN
pdfpdf rates_gt5a2_ucsd.pdf manage 527.0 K 2009/08/24 - 19:38 IgorSfiligoi GRAM5 alpha2 benchmarks - UCSD LAN
elseodt rates_gram5rc_ucsd.odt manage 50.9 K 2009/12/21 - 17:46 IgorSfiligoi GRAM5 rc benchmarks - UCSD LAN
pdfpdf rates_gram5rc_ucsd.pdf manage 492.5 K 2009/12/21 - 17:47 IgorSfiligoi GRAM5 rc benchmarks - UCSD LAN
pdfpdf gram5basetest.pdf manage 435.5 K 2009/08/12 - 20:10 IgorSfiligoi gRAM5 validation test
elseodt gram5basetest.odt manage 94.8 K 2009/08/12 - 20:10 IgorSfiligoi gRAM5 validation test - source
pngpng bench_30min_10k_rates.png manage 119.1 K 2010/02/11 - 19:52 IgorSfiligoi Rates of the 10k run
elseods bench_30min_10k.ods manage 127.9 K 2010/02/20 - 05:28 IgorSfiligoi Running 10k jobs against both GT2 and GRAM5 - UCSD LAN - Condor 7.4.1
Topic revision: r15 - 2010/06/10 - 22:50:28 - IgorSfiligoi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback