This page is meant to contain the information we gather while testing GRAM5.
GRAM5 validation tests
Jeff and Igor did a first round of tests to check if GRAM5 (alpha2) indeed worked and how it compared to GT2 on a small scale.
GRAM 5 alpha benchmarking
GRAM5 alpha2 was installed on osg-gw-5.t2.ucsd.edu.
- File transfer is not working properly. All further tests are without input or output files.
- Submitting from a machine on a local LAN, 1Hz (50 jobs/min) job turnaround is easy to acchieve. Peaks of 3Hz (170 jobs/min) has also been observed.
If monitoring is not needed, 3Hz (200 jobs/min) sustained rate should be acchievable.
For more details: rates_gt5a2_ucsd.pdf
- Submitting from a machine on the other side of the world, the job turnaround is about 17 jobs/min. Peaks of 30 jobs/min has also been observed.
If monitoring is not needed, 200 jobs/min (3Hz) sustained rate should be acchievable.
For more details: rates_gt5a2_ucsd_r2.pdf
Note: No clear conclusions, yet... I have found that GRAM5 gets slower in time, so the results may be affected by that.
GRAM 5 beta 1
While up to alpha3, GRAM5 played well on top of a previous installation of a 1.10.1 VDT, beta1 needs some changes:
- The initial install used the VDT perl, and this created some problems. Make sure you use the system perl.
- jobmanager-fork is not the default anymore; one needs to change the symlink in globus/etc/grid-services
- copied VDT $GLOBUS_LOCATION/etc/gridftp.cong
- created $GLOBUS_LOCATION/var/log (used by gridFTP)
OSG uses a patched/augmented version jobmanager-condor (condor.in->condor.pm). The jobmanager-condor changed slightly between GT2 and GRAM5, so we had to merge the two.
The work was performed by Christopher Theissen, and is documented at here
Just installed GRAM2 beta2 on Nov 25th.
GRAM 5 RC benchmarking
GRAM5 rc was installed on osg-gw-5.t2.ucsd.edu.
- No major problems found anymore... Condor-G timed out on a few jobs putting them on hold... likely a Condor-G problem
- The benchmarking seemed limited by the submit machine CPU (uaf-1); the jobs were submittied to GRAM5 as fast as jobs were being submitted to the client's schedd
- Rates of up to ~2Hz (100 jobs/min) were observed
- Detailed numbers can be found at rates_gram5rc_ucsd.pdf
A second test on the same installation, with direct comparison with GT2 (on 2010/02/10):
- Ran 10k 30 minute jobs against both GT2 and GRAM5 on osg-gw-5, using Condor-G v7.4.1 on uaf-1
- GRAM5 finished in about 6 hours, GT2 took about 9 hours.
- Condor-G numbers were much closer to reality under GRAM5 than under GT2.
- Under GRAM5 Condor-G noticed the termination of the last file close to the 6 hour mark, although it was not always that close during the run.
- Under GT2, Condor-G thought jobs were running almost 14 hours.
- For detailed results, see bench_30min_10k.ods.
Backward compatibility, using Condor-G's gt2 mode:
- everything worked, but only 10 jobs at a time were in the CE queue
- enough to allow casual use ofa GRAM5 CE, but condor-G 7.4+with its gt5 mode is NEEDED to useon large scale
OSG use of GRAM5
OSG is planning in officially suporting GRAM5 in the near term.
The first step in the process is represented by this planning document