TWiki> UCSDTier2 Web>OSGScal>CondorScal (revision 2)EditAttach

Condor Scalability tests

This page contains the tests performed against Condor to push the scalability limits.

Oct 2010

Condor 7.5.4 pre-release, glideinWMS v2, loadtest_condor 1.1

Using a 64GB schedd node at FNAL, a 16GB collector node (1+400) at FNAL, and getting glideins from shadow pools at FNAL, UCSD and Madison, we were able to achieve ~40k long running jobs on a single schedd. After than, the system becomes unstable.
50k_one_q_2.png

Using several schedd, we were able to run 90k jobs on a single collector. We did not observe any limits, and just stopped at that treshold due to lack of aditional compute resources.
90k_s.png

Jan/Feb 2011

Condor 7.5.5 pre-releas, glideinWMS v2, loadtest_condor 1.1

Using a 64GB schedd node at FNAL, a 16GB collector node (1+200) at FNAL, and getting glideins from shadow pools at FNAL, UCSD and Madison, we were able to achieve 60k long running jobs for extended period of time with no user-level problems. The limit was purely memory availalble on the schedd node.

cq_60k.png

Using 10 minute jobs submitted by a single dagman, the same system stabilizes around 6k running jobs.

cq_10min.png

During the scalability tests, we also measured the matching speed of the negotiator; the test was the best-case scenario with a single autocluster and very basic requirements. On the test node (dual Intel Xeon E5430 @ 2.66GHz) it wasmanaging to match between 8 and 15 jobs per second.

During the test, we noticed that the Negotiator was wasting a lot of time gathering statistics when O(3k) jobs were matched in a single cycle. This seems to be due to heap management; dynamically linking the negotiator with TCMalloc seems to solve the problem.

We also observed the collector entering into a very low-response state, especially when a large number of glideins terminated at the same time. Again, the problem seemed to be related to heap management, and using TCMalloc solved the problem.

-- IgorSfiligoi - 2011/02/08

Edit | Attach | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2011/02/09 - 00:20:29 - IgorSfiligoi
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback