Difference: DCTest (12 vs. 13)

Revision 132011/04/21 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 101 to 101
 The tested system peaked at ~4.3Hz with 50 clients, and was slowly degrading to just below 2Hz at 1000 client. The error rates started to show up with 600 clients, but were really problematic after 800 clients.

Run 4

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test used postgres and kwp, with the following paramaters:
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
Line: 122 to 124
 The tested system was peaked at 2.7Hz with 50 clients, stays in the 2Hz range until abount 400 clients and then starts to degrate. Errors start to appear with 400 clients, but get really problematic at around 800.

Run 5

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test was similar to Run 4, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 144 to 148
 However, once reached 800 clients, the system seems to misbehave badly.

Run 6

Added:
>
>
-- IgorSfiligoi - 2010/12/22
 This test was similar to Run 5, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 165 to 171
 The concurrency limit is still around the 600 mark.

Run 7

Added:
>
>
-- IgorSfiligoi - 2011/04/19
 This test was similar to Run 6, just the OS was upgraded to SL5 and the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.

Line: 184 to 192
 
1000 3.4k (0.7Hz) 2.1M
1200 0 6.9M
Changed:
<
<
The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to abotu the 800 mark.
>
>
The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to about the 800 mark.
  The server hung up during the 1.2k run, and had to be manually restarted.

FNAL lcg-cp tests

Line: 232 to 240
 The tested system peaked at 50 clients, delivering files at 2.2Hz, or 200Mbit/s, and then declining to ~1.4Hz
The first error appear with 150 clients, but are still bearable up to about 600 client.
With 800 clients, more than half of all attempts failed, while with 1000 clients all the attempts failed.

Run 2

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test used postgres and kwp, with the following paramaters:
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
Line: 250 to 260
 By 400 clients the system was practically unusable.

Run 3

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test was similar to Run 2, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 266 to 278
 The deterioration rate was much faster, though. At 200 clients the system was already unusable.

Run 4

Added:
>
>
-- IgorSfiligoi - 2010/12/22
 This test was similar to Run 3, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 286 to 300
 
400 3.0k (1.3Hz) 1200
450 3.3k (1.4Hz) 530
500 3.2k (1.3Hz) 820
Changed:
<
<
550 1.6k (2.7 Hz) 3.2k + 26 hung
>
>
550 1.6k (0.7Hz) 3.2k + 26 hung
 
600 0 all
650 0 all

Like with Run 3, the tested system peaked at 2.2Hz with 50 clients and 200Mbit/s. But the deteriorarion is much slower; while errors start to appear around the 150 mark, the system is still usable(with retries) up to about 500 concurrent clients.

Added:
>
>

Run 5

-- IgorSfiligoi - 2011/04/20

This test was similar to Run 4, just the OS was upgraded to SL5 and the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.

Complete results can be seen below:

Concurrency Succeeded (Rate) Failed
25 1.7k (1,4Hz) 0
50 2.4k (2.0Hz) 1
75 2.4k (2.0Hz) 0
100 2.4k (2.0Hz) 0
150 2.3k (1.9Hz) 33
200 3.6k (1.5Hz) 712
300 3.4k (1.4Hz) 907
400 3.3k (1.4Hz) 1.2k
600 5.6k (1.2Hz) 2.8k
800 6.1k (1.3Hz) 4.6k
1000 0.9k 5.6M
1200 0 6.9M

The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to about the 800 mark.

The server hung up during the 1.2k run, and had to be manually restarted.

  -- IgorSfiligoi - 2010/05/07
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback