OSG Scalability Testbed
Scope
The OSGScalabilit, Reliability and Usability activity needs a test infrastructure to carri out its mission.
This page is meant to host the information about the currently used infrastructure, as well as thoughtson how to provide the missing parts.
CE/Computing testbed
The current testbed is using a dedicated test CE and shadow Condor pool running alongside the production pool.
Condor Shadow Pool
The Condor shadow pool uses the same condor_collector and condor_negotiator as the production pool.
However, the CE runs a dedicated schedd, and each worker node is configured to have more slots than available CPUs.
Only authorized uses with a
"SleepSlot"
attribute are allowed to run on those additional slots.
As of end of August, there were 2.8k shadow batch slots available, distributed across 100 nodes.
The Grid Compute Element
The CE runs the Grid gatekeeper; it may be the default OSG CE, or a different gatekeeper (GRAM5, CREAM, ARC), depending on the test being performed.
The hardware of the machine is as follows:
- 2 x AMD Opteron 275 (4 cores total)
- 8GB of memory
- 2TB disk space, mounted as RAID0
It runs a RHEL5 compatible OS.
SE/Storage testbed
We effectively don't have a storage testbed at this point.
We are looking into ways for reusing some of the old hardware to build such a testbed.
Here is what we expect to set up:
- O(16) data nodes, each with 2GB of RAM and a 250G disk
- O(4) service nodes, each with 8GB of RAM
This should be enough to set up a test Hadoop pool populated with many small files, and use it for stess tests.
Condor/glideinWMS testbed
We currently don't have a test pool for Condor and/or glideinWMS scalability.
We presently rely on hardware provided by Fermilab to perform this task.
--
IgorSfiligoi - 2009/09/02