Condor Scalability tests

This page contains the tests performed against Condor to push the scalability limits.

Oct 2010

Condor 7.5.4 pre-release, glideinWMS v2, loadtest_condor 1.1

Using a 64GB schedd node at FNAL, a 16GB collector node (1+400) at FNAL, and getting glideins from shadow pools at FNAL, UCSD and Madison, we were able to achieve ~40k long running jobs on a single schedd. After than, the system becomes unstable.
50k_one_q_2.png

Using several schedd, we were able to run 90k jobs on a single collector. We did not observe any limits, and just stopped at that treshold due to lack of aditional compute resources.
90k_s.png

Jan/Feb 2011

Condor 7.5.5 pre-releas, glideinWMS v2, loadtest_condor 1.1

Using a 64GB schedd node at FNAL, a 16GB collector node (1+200) at FNAL, and getting glideins from shadow pools at FNAL, UCSD and Madison, we were able to achieve 60k long running jobs for extended period of time with no user-level problems. The limit was purely memory availalble on the schedd node.

cq_60k.png

Using 10 minute jobs submitted by a single dagman, the same system stabilizes around 6k running jobs.

6

During the scalability tests, we also measured the matching speed of the negotiator; the test was the best-case scenario with a single autocluster and very basic requirements. On the test node (dual Intel Xeon E5430 @ 2.66GHz) it wasmanaging to match between 8 and 15 jobs per second.

During the test, we noticed that the Negotiator was wasting a lot of time gathering statistics when O(3k) jobs were matched in a single cycle. This seems to be due to heap management; dynamically linking the negotiator with TCMalloc seems to solve the problem.

We also observed the collector entering into a very low-response state, especially when a large number of glideins terminated at the same time. Again, the problem seemed to be related to heap management, and using TCMalloc solved the problem.

Apr 2011

Condor 7.6.0, glideinWMS v2_5_1 (+minor patches), loadtest_condor 1.1 (+minor patches)

This time the test was about Negotiator scalability.

The test consisted in running jobs with 10k-15k glideins, once with a simple requirement, and once with a complex one that created one autocluster per job:

  • simple:
    Requirements=True
  • complex:
    Requirements = ( ( stringListMember(GLIDEIN_Site,string(ClusterId? )) || stringListMember(GLIDEIN_Gatekeeper,string(ClusterId? )) || (GLIDEIN_Fake=?=UNDEFINED)) && Arch=!Dummy) ) && ( ( Memory > 1 ) && ( Disk >= 1 )

Simple glidein start expression

Using a single sleeper pool, with a accept all Start condition, and with the negotiator limited to 20s per cycle
NEGOTIATOR_MAX_TIME_PER_SUBMITTER=40
NEGOTIATOR_MAX_TIME_PER_PIESPIN=20

the system behaved pretty much the same way with either the simpler or the complex job requirements.

However, looking closer to the Negotiator behavior, it is clear that most jobs don't get considered for matching;the NegotiatorLog? has a
Reached max time per spin: 20 ... stopping
line at the end of each cycle, and the Negotiator ClassAd? reports that only ~250 job, out of a total of ~1.5k have been considered for matchmaking:
LastNegotiationCycleNumJobsConsidered0 = 254
LastNegotiationCycleRejections0 = 234
LastNegotiationCycleNumIdleJobs0 = 1620
LastNegotiationCycleTotalSlots0 = 15081
LastNegotiationCycleCandidateSlots0 = 2002

Selective glidein start expression

To test how the above described behaviour affects the system (when there are many autoclusters), new glideins were configured to only access a subset of the jobs:
GLIDEIN_Entry_Start="(round(ClusterId? /10)*10==ClusterId)"

As expected, only the first ~20 idle jobs that matched (out of ~200) started running, even as there were plenty (>200) unclaimed glideins in the system.
The net result was both delayed job execution and wasted cpu cycles.

So this configuration is not really functional.

To correct for the above, the negotiator limits were commented out:
# NEGOTIATOR_MAX_TIME_PER_SUBMITTER=40
#NEGOTIATOR_MAX_TIME_PER_PIESPIN=20

All the deserving jobs thus started to run, but at the expense of the negotiator cycle time, which now increased to ~3 minutes (with 1.2k idle jobs in the queue) compared to ~50s it took before.

-- IgorSfiligoi - 2011/02/08


This topic: UCSDTier2 > WebHome > OSGScal > CondorScal
Topic revision: r3 - 2011/04/26 - 19:47:29 - IgorSfiligoi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback