Difference: DCTest (1 vs. 15)

Revision 152011/06/03 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 196 to 196
  The server hung up during the 1.2k run, and had to be manually restarted.
Added:
>
>

Run 8

-- IgorSfiligoi - 2011/05/23

This test was similar to Run 7, just the the dcache version is now dcache-server-1.9.5-26.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.

Complete results can be seen below:

Concurrency Succeeded (Rate) Failed
25 4.5k (3.7Hz) 0
50 5.5k (4.6Hz) 1
75 5.0k (4.3Hz) 0
100 4.5k (3.9Hz) 0
150 4.0k (3.3Hz) 0
200 6.7k (2.8Hz) 2
300 6.3k (2.6Hz) 0.4k
400 5.1k (2.1Hz) 1.0k
600 6.3k (1.3Hz) 1.7k
800 2.5k (0.5Hz) 76k
1000 0 ALL

The system performed slightly worse than before, hitting the wall at 1k. But it did recover by itself once the jobs stopped.

 

FNAL lcg-cp tests

The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

Line: 333 to 358
  The server hung up during the 1.2k run, and had to be manually restarted.
Deleted:
<
<
-- IgorSfiligoi - 2010/05/07
 \ No newline at end of file
Added:
>
>

Run 5

-- IgorSfiligoi - 2011/05/23

This test was similar to Run 5, just the the dcache version is now dcache-server-1.9.5-26.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 80 minutes (4800s) after that.

Complete results can be seen below:

Concurrency Succeeded (Rate) Failed
25 2.0k (1,7Hz) 0
50 2.5k (2.1Hz) 1
75 2.4k (2.0Hz) 0
100 2.3k (1.9Hz) 0
150 2.1k (1.8Hz) 43
200 6.7k (1.4Hz) 1.0k
300 5.9k (1.2Hz) 2.3k
400 4.9k (1.0Hz) 6.7k
600 2.6k (0.5Hz) 28k

The system performed slightly worse than before, severely degrading already at 600. But it never hang.

Revision 142011/05/23 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Revision 132011/04/21 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 101 to 101
 The tested system peaked at ~4.3Hz with 50 clients, and was slowly degrading to just below 2Hz at 1000 client. The error rates started to show up with 600 clients, but were really problematic after 800 clients.

Run 4

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test used postgres and kwp, with the following paramaters:
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
Line: 122 to 124
 The tested system was peaked at 2.7Hz with 50 clients, stays in the 2Hz range until abount 400 clients and then starts to degrate. Errors start to appear with 400 clients, but get really problematic at around 800.

Run 5

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test was similar to Run 4, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 144 to 148
 However, once reached 800 clients, the system seems to misbehave badly.

Run 6

Added:
>
>
-- IgorSfiligoi - 2010/12/22
 This test was similar to Run 5, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 165 to 171
 The concurrency limit is still around the 600 mark.

Run 7

Added:
>
>
-- IgorSfiligoi - 2011/04/19
 This test was similar to Run 6, just the OS was upgraded to SL5 and the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.

Line: 184 to 192
 
1000 3.4k (0.7Hz) 2.1M
1200 0 6.9M
Changed:
<
<
The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to abotu the 800 mark.
>
>
The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to about the 800 mark.
  The server hung up during the 1.2k run, and had to be manually restarted.

FNAL lcg-cp tests

Line: 232 to 240
 The tested system peaked at 50 clients, delivering files at 2.2Hz, or 200Mbit/s, and then declining to ~1.4Hz
The first error appear with 150 clients, but are still bearable up to about 600 client.
With 800 clients, more than half of all attempts failed, while with 1000 clients all the attempts failed.

Run 2

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test used postgres and kwp, with the following paramaters:
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
Line: 250 to 260
 By 400 clients the system was practically unusable.

Run 3

Added:
>
>
-- IgorSfiligoi - 2010/06/17
 This test was similar to Run 2, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 266 to 278
 The deterioration rate was much faster, though. At 200 clients the system was already unusable.

Run 4

Added:
>
>
-- IgorSfiligoi - 2010/12/22
 This test was similar to Run 3, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Line: 286 to 300
 
400 3.0k (1.3Hz) 1200
450 3.3k (1.4Hz) 530
500 3.2k (1.3Hz) 820
Changed:
<
<
550 1.6k (2.7 Hz) 3.2k + 26 hung
>
>
550 1.6k (0.7Hz) 3.2k + 26 hung
 
600 0 all
650 0 all

Like with Run 3, the tested system peaked at 2.2Hz with 50 clients and 200Mbit/s. But the deteriorarion is much slower; while errors start to appear around the 150 mark, the system is still usable(with retries) up to about 500 concurrent clients.

Added:
>
>

Run 5

-- IgorSfiligoi - 2011/04/20

This test was similar to Run 4, just the OS was upgraded to SL5 and the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.

Complete results can be seen below:

Concurrency Succeeded (Rate) Failed
25 1.7k (1,4Hz) 0
50 2.4k (2.0Hz) 1
75 2.4k (2.0Hz) 0
100 2.4k (2.0Hz) 0
150 2.3k (1.9Hz) 33
200 3.6k (1.5Hz) 712
300 3.4k (1.4Hz) 907
400 3.3k (1.4Hz) 1.2k
600 5.6k (1.2Hz) 2.8k
800 6.1k (1.3Hz) 4.6k
1000 0.9k 5.6M
1200 0 6.9M

The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to about the 800 mark.

The server hung up during the 1.2k run, and had to be manually restarted.

  -- IgorSfiligoi - 2010/05/07

Revision 122011/04/20 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 165 to 165
 The concurrency limit is still around the 600 mark.

Run 7

Changed:
<
<
This test was similar to Run 6, just the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.
>
>
This test was similar to Run 6, just the OS was upgraded to SL5 and the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.
  The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.
Line: 184 to 184
 
1000 3.4k (0.7Hz) 2.1M
1200 0 6.9M
Changed:
<
<
The tested system performs similarly to the previous test, although it is marginally better.
>
>
The tested system performs similarly to the previous test, although it is marginally better. The concurrency limit seems to have improved to abotu the 800 mark.
 
Changed:
<
<
The concurrency limit is still around the 600 mark.
>
>
The server hung up during the 1.2k run, and had to be manually restarted.
 

FNAL lcg-cp tests

The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

Revision 112011/04/20 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 167 to 167
  This test was similar to Run 6, just the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.
Changed:
<
<
The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) afterwards.
>
>
The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) up to 400 and 80 minutes (4800s) after that.
  Complete results can be seen below:
Concurrency Succeeded (Rate) Failed
Line: 179 to 179
 
200 7.4k (3.1Hz) 0
300 6.3k (2.6Hz) 0
400 6.8k (2.8Hz) 0
Changed:
<
<
     
     
     
>
>
600 12k (2.5Hz) 1.1k
800 7.6k (1.6Hz) 8.5k
1000 3.4k (0.7Hz) 2.1M
1200 0 6.9M
 
Changed:
<
<

FNAL lcg-cp tests

>
>
The tested system performs similarly to the previous test, although it is marginally better.

The concurrency limit is still around the 600 mark.

FNAL lcg-cp tests

  The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

All glideTester jobs were running lcg-cp from the SE to the local disk of a 10Mbyte file in a tight loop for a specified amount of time.

Changed:
<
<
 lcg-cp -b -D srmv2 $mytestdir/igors_file.dat file:$PWD/igors_file.dat 
Note: lcg-cp returns 0 even when it fails!
>
>
 lcg-cp -b -D srmv2 $mytestdir/igors_file.dat file:$PWD/igors_file.dat 
Note: lcg-cp returns 0 even when it fails!
 

Run 1

Revision 102011/04/19 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 169 to 169
  The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) afterwards.
Changed:
<
<
Complete results can be seen below:
>
>
Complete results can be seen below:
 
Concurrency Succeeded (Rate) Failed
25 4.5k (3.7Hz) 0
50 5.5k (4.6Hz) 0
Line: 178 to 178
 
150 4.3k (3.6Hz) 0
200 7.4k (3.1Hz) 0
300 6.3k ( 2.6Hz) 0
Changed:
<
<
     
>
>
400 6.8k (2.8Hz) 0
 
     
     
     

Revision 92011/04/19 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 160 to 160
 
800 0 all
1000 0 all
Changed:
<
<
The tested system performs significantly better at low concurrencies ( 4.7Hz vs .27Hz), but it steadily declines and is only marginally better at higher concurrencies.
>
>
The tested system performs significantly better at low concurrencies ( 4.7Hz vs 2.7Hz), but it steadily declines and is only marginally better at higher concurrencies.
  The concurrency limit is still around the 600 mark.
Added:
>
>

Run 7

This test was similar to Run 6, just the dcache version is dcache-server-1.9.5-25.The clients were running on the UCSD sleeper pool.

The glideTester jobs were configured to run for 20 minutes (1200s) for concurrencies up to 150, and 40 minutes (2400s) afterwards.

Complete results can be seen below:

Concurrency Succeeded (Rate) Failed
25 4.5k (3.7Hz) 0
50 5.5k (4.6Hz) 0
75 5.1k (4.3Hz) 0
100 4.6k (3.9Hz) 0
150 4.3k (3.6Hz) 0
200 7.4k (3.1Hz) 0
300 6.3k ( 2.6Hz) 0
     
     
     
     
 

FNAL lcg-cp tests

Revision 82010/12/22 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 102 to 102
 

Run 4

This test used postgres and kwp, with the following paramaters:

Changed:
<
<
max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  
>
>
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
  The glideTester jobs were configured to run for 40 minutes (2400s).
Line: 145 to 142
 At low concurrencies, the tested system gave similar results as Run 4. It peaked at 2.7Hz with 50 clients, stays in the 2Hz range until abount 400 clients and then starts to degrate.

However, once reached 800 clients, the system seems to misbehave badly.

Added:
>
>

Run 6

This test was similar to Run 5, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
50 11.3k (4.7Hz) 0
100 9.0k (3.7Hz) 0
150 7.5k (3.1Hz) 0
200 6.4k (2.7Hz) 0
300 5.9k (2.5Hz) 0
400 5.0k (2.1Hz) 0
600 4.9k (2.0Hz) 135
800 0 all
1000 0 all

The tested system performs significantly better at low concurrencies ( 4.7Hz vs .27Hz), but it steadily declines and is only marginally better at higher concurrencies.

The concurrency limit is still around the 600 mark.

 

FNAL lcg-cp tests

Line: 192 to 210
 

Run 2

This test used postgres and kwp, with the following paramaters:

Changed:
<
<
max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  
>
>
max_connections - 250  shared_buffers - 512MB work_mem - 16MB  max_fsm_pages - 1000000  
  The glideTester jobs were configured to run for 40 minutes (2400s).
Line: 226 to 241
 Like with Run 2, the tested system peaked at 2.2Hz with 50 clients and 200Mbit/s, and then rapidly deteriorated.

The deterioration rate was much faster, though. At 200 clients the system was already unusable.

Added:
>
>

Run 4

This test was similar to Run 3, but using XACML GUMS and dcache version 1.9.5-23.

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
25 4.3k (1.8Hz) 0
50 5.2k (2.2Hz) 0
75 5.0k (2.1Hz) 0
100 4.9k (2.0Hz) 0
125 4.7k (2.0Hz) 0
150 4.5k (1.9Hz) 15
175 4.1k (1.7Hz) 250
200 3.8k (1.6Hz) 520
250 3.4k (1.4Hz) 720
300 3.3k (1.4Hz) 860
350 3.4k (1.4Hz) 840
400 3.0k (1.3Hz) 1200
450 3.3k (1.4Hz) 530
500 3.2k (1.3Hz) 820
550 1.6k (2.7 Hz) 3.2k + 26 hung
600 0 all
650 0 all

Like with Run 3, the tested system peaked at 2.2Hz with 50 clients and 200Mbit/s. But the deteriorarion is much slower; while errors start to appear around the 150 mark, the system is still usable(with retries) up to about 500 concurrent clients.

  -- IgorSfiligoi - 2010/05/07 \ No newline at end of file

Revision 72010/06/17 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 102 to 102
 

Run 4

This test used postgres and kwp, with the following paramaters:

Changed:
<
<
max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  
>
>
max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  
  The glideTester jobs were configured to run for 40 minutes (2400s).
Line: 186 to 189
 * - Two jobs got stuck and did not finish for over 1 hour and had to be hard killed.
^ - Twenty jobs got stuck and did not finish for over 1 hour and had to be hard killed.

The tested system peaked at 50 clients, delivering files at 2.2Hz, or 200Mbit/s, and then declining to ~1.4Hz
The first error appear with 150 clients, but are still bearable up to about 600 client.
With 800 clients, more than half of all attempts failed, while with 1000 clients all the attempts failed.

Added:
>
>

Run 2

This test used postgres and kwp, with the following paramaters:

max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
50 5353 (2.2Hz) 0
100 4678 (1.9Hz) 0
200 3472 (1.4Hz) 320 + 1 hung client
400 391 (0.2Hz) 29198
600 0 24000

The tested system was peaked at 2.2Hz with 50 clients and 200Mbit/s, and then rapidly deteriorates.

By 400 clients the system was practically unusable.

Run 3

This test was similar to Run 2, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
10 1921 (0.8Hz) 1
50 5399 (2.2Hz) 1
100 4694 (2.0Hz) 1
200 113 (0.0Hz) 4790 + 12 hung

Like with Run 2, the tested system peaked at 2.2Hz with 50 clients and 200Mbit/s, and then rapidly deteriorated.

The deterioration rate was much faster, though. At 200 clients the system was already unusable.

  -- IgorSfiligoi - 2010/05/07

Revision 62010/06/17 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 99 to 99
 
1000 3791 (1.6Hz) 2682

The tested system peaked at ~4.3Hz with 50 clients, and was slowly degrading to just below 2Hz at 1000 client. The error rates started to show up with 600 clients, but were really problematic after 800 clients.

Added:
>
>

Run 4

This test used postgres and kwp, with the following paramaters:

max_connections - 250 
shared_buffers - 512MB
work_mem - 16MB 
max_fsm_pages - 1000000  

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
10 4520 (1.8Hz) 0
50 6527 (2.7Hz) 0
100 5457 (2.3Hz) 0
150 4933 (2.1Hz) 0
200 4697 (2.0Hz) 0
300 4216 (1.8Hz) 0
400 4038 (1.7Hz) 62
600 3707 (1.5Hz) 102
800 3524 (1.5Hz) 850
1000 2338 (1.0Hz) 9727

The tested system was peaked at 2.7Hz with 50 clients, stays in the 2Hz range until abount 400 clients and then starts to degrate. Errors start to appear with 400 clients, but get really problematic at around 800.

Run 5

This test was similar to Run 4, but using GUMS.

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
10 4499 (1.8Hz) 0
50 6482 (2.7Hz) 0
100 5537 (2.3Hz) 0
150 4947 (2.1Hz) 0
200 4719 (2.0Hz) 0
300 4123 (1.8Hz) 0
400 4025 (1.7Hz) 0
600 3721 (1.5Hz) 114
800 2806 (1.2Hz) 6491
1000 931 (0.4Hz) 21980+38 hung clients

At low concurrencies, the tested system gave similar results as Run 4. It peaked at 2.7Hz with 50 clients, stays in the 2Hz range until abount 400 clients and then starts to degrate.

However, once reached 800 clients, the system seems to misbehave badly.

 

FNAL lcg-cp tests

The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

Revision 52010/05/21 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

This page contains the results of tests ran against dCache as part of the OSG scalability and reliability area activity.

FNAL test instance

Changed:
<
<
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a single 3.6GHz Xeon, 4GB of RAM and GigE? Ethernet, running a 32-bit version of SL4.
>
>
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.

The test instance is composed of 6 nodes:

  • dCache admin node
  • chimera/pnfs node
  • srm node
  • a dCache door, and
  • 2 pool nodes.

The SRM machine is a single 3.6GHz Xeon, 4GB of RAM and GigE? Ethernet, running a 32-bit version of SL4.

 

FNAL lcg-ls tests

The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

Line: 95 to 104
 The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

All glideTester jobs were running lcg-cp from the SE to the local disk of a 10Mbyte file in a tight loop for a specified amount of time.

Changed:
<
<
 lcg-cp -b -D srmv2 $mytestdir/igors_file.dat file:$PWD/igors_file.dat 
>
>
 lcg-cp -b -D srmv2 $mytestdir/igors_file.dat file:$PWD/igors_file.dat 
Note: lcg-cp returns 0 even when it fails!
 

Run 1

Revision 42010/05/20 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

Line: 121 to 122
 
10 1933 (0.8Hz) 0
50 5372 (2.2Hz) 0
100 4810 (2Hz) 0
Changed:
<
<
150 4129 (2.9Hz) 0
200 3801 (1,6Hz) 0*
300 4084 (1.7Hz) 0
400 3825 (1.6Hz) 0
600 4713 (2Hz) 0*
800 ? (?Hz) ?
1000 ? (?Hz) ?
>
>
150 4090 (1.7Hz) 39
200 3403 (1,4Hz) 398*
300 3442 (1.4Hz) 642
400 3404 (1.4Hz) 421
600 3946 (1.6Hz) 767*
800 2287 (1Hz) 3072^
1000 0 all failed
 
Changed:
<
<
* - A few jobs got stuck and did not finish for over 1 hour and had to be hard killed.
>
>
* - Two jobs got stuck and did not finish for over 1 hour and had to be hard killed.
^ - Twenty jobs got stuck and did not finish for over 1 hour and had to be hard killed.
 
Changed:
<
<
The tested system seems to deliver files at about 2Hz, or 200Mbit/s.
>
>
The tested system peaked at 50 clients, delivering files at 2.2Hz, or 200Mbit/s, and then declining to ~1.4Hz
The first error appear with 150 clients, but are still bearable up to about 600 client.
With 800 clients, more than half of all attempts failed, while with 1000 clients all the attempts failed.
  -- IgorSfiligoi - 2010/05/07 \ No newline at end of file

Revision 32010/05/20 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

This page contains the results of tests ran against dCache as part of the OSG scalability and reliability area activity.

FNAL test instance

Changed:
<
<
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a single 3.6GHz Xeon and 4GB of RAM running a 32-bit version of SL4.

Local lcg-ls tests

>
>
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a single 3.6GHz Xeon, 4GB of RAM and GigE? Ethernet, running a 32-bit version of SL4.

FNAL lcg-ls tests

  The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

All glideTester jobs were running lcg-ls in a tight loop for a specified amount of time.

Added:
>
>
 lcg-ls -b -D srmv2 $mytestdir
 

Run 1

This run ran against dCache v1.5.9-17 with Chimera. Default parameters were used.

Line: 88 to 90
 
1000 3791 (1.6Hz) 2682

The tested system peaked at ~4.3Hz with 50 clients, and was slowly degrading to just below 2Hz at 1000 client. The error rates started to show up with 600 clients, but were really problematic after 800 clients.

Added:
>
>

FNAL lcg-cp tests

The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

All glideTester jobs were running lcg-cp from the SE to the local disk of a 10Mbyte file in a tight loop for a specified amount of time.

 lcg-cp -b -D srmv2 $mytestdir/igors_file.dat file:$PWD/igors_file.dat 

Run 1

This run ran against dCache v1.9.5-19 with Chimera. The following parameters have been tuned:

  1. modify max_connections from default 100 to 250 in postgresql.conf of srm database
  2. in dCacheSetup, make sure:
    srmAsynchronousLs=true
  3. in /opt/d-cache/libexec/apache-tomcat-5.5.20/conf/server.xml,
    find element: <Connector className="org.globus.tomcat.coyote.net.HTTPSConnector"
    set these parameters:
  • maxThreads="1000"
  • minSpareThreads="25"
  • maxSpareThreads="200"
  • maxProcessors="1000"
  • minProcessors="25"
  • maxSpareProcessors="200"
  • enableLookups="false"
  • disableUploadTimeout="true"
  • acceptCount="1024"

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
10 1933 (0.8Hz) 0
50 5372 (2.2Hz) 0
100 4810 (2Hz) 0
150 4129 (2.9Hz) 0
200 3801 (1,6Hz) 0*
300 4084 (1.7Hz) 0
400 3825 (1.6Hz) 0
600 4713 (2Hz) 0*
800 ? (?Hz) ?
1000 ? (?Hz) ?

* - A few jobs got stuck and did not finish for over 1 hour and had to be hard killed.

The tested system seems to deliver files at about 2Hz, or 200Mbit/s.

  -- IgorSfiligoi - 2010/05/07

Revision 22010/05/19 - Main.IgorSfiligoi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

dCache Scalability Tests

This page contains the results of tests ran against dCache as part of the OSG scalability and reliability area activity.

FNAL test instance

Changed:
<
<
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a singe 3.6GHz Xeon and 4GB of RAM running a 32-bit version of SL4.
>
>
The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a single 3.6GHz Xeon and 4GB of RAM running a 32-bit version of SL4.

Local lcg-ls tests

 
Changed:
<
<
The testest reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.
>
>
The tests reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.
 
Changed:
<
<
All glideTester jobs were unning lcg-ls in a tight loop for a specified amount of time.

Run 1

>
>
All glideTester jobs were running lcg-ls in a tight loop for a specified amount of time.

Run 1

  This run ran against dCache v1.5.9-17 with Chimera. Default parameters were used.
Line: 29 to 30
 The tested system peaked at about 9Hz with 50 clients. After that level it became painfully slow, basically unusable, although no user visible errors could be noticed.

It should be noted that once the unstable situation was reached, the system remained very slow even with a small number of clients, until a human operator fixed the problem on the server side.

Changed:
<
<

Run 2

>
>

Run 2

  This run ran against dCache v1.5.9-12 with PNFS. Default parameters were used.
Line: 53 to 54
 
1000 2625 (1Hz) 11847

The tested system was pretty consistent at 2Hz up to 800 clients. However, starting at 700 clients users start to see errors.

Added:
>
>

Run 3

This run ran against dCache v1.9.5-19 with Chimera. The following parameters have been tuned:

  1. modify max_connections from default 100 to 250 in postgresql.conf of srm database
  2. in dCacheSetup, make sure:
    srmAsynchronousLs=true
  3. in /opt/d-cache/libexec/apache-tomcat-5.5.20/conf/server.xml,
    find element: <Connector className="org.globus.tomcat.coyote.net.HTTPSConnector"
    set these parameters:
  • maxThreads="1000"
  • minSpareThreads="25"
  • maxSpareThreads="200"
  • maxProcessors="1000"
  • minProcessors="25"
  • maxSpareProcessors="200"
  • enableLookups="false"
  • disableUploadTimeout="true"
  • acceptCount="1024"

The glideTester jobs were configured to run for 40 minutes (2400s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
10 4408 (1.8Hz) 0
50 10450 (4.3Hz) 0
100 8087 (3.4Hz) 0
150 6922 (2.9Hz) 0
200 6375 (2,7Hz) 0
300 5442 (2.3Hz) 0
400 4931 (2.1Hz) 0
600 4469 (1.9Hz) 231
800 4193 (1.7Hz) 784
1000 3791 (1.6Hz) 2682

The tested system peaked at ~4.3Hz with 50 clients, and was slowly degrading to just below 2Hz at 1000 client. The error rates started to show up with 600 clients, but were really problematic after 800 clients.

  -- IgorSfiligoi - 2010/05/07

Revision 12010/05/07 - Main.IgorSfiligoi

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

dCache Scalability Tests

This page contains the results of tests ran against dCache as part of the OSG scalability and reliability area activity.

FNAL test instance

The FNAL team, composed mainly by Tanya Levshina and Neha Sharma, operates a test instance of dCache.
The machine is a singe 3.6GHz Xeon and 4GB of RAM running a 32-bit version of SL4.

The testest reported in this section were performed against that instance, using a glideTester instance using FNAL sleeper pool resources.

All glideTester jobs were unning lcg-ls in a tight loop for a specified amount of time.

Run 1

This run ran against dCache v1.5.9-17 with Chimera. Default parameters were used.

The glideTester jobs were configured to run for 20 minutes (1200s).

Complete results can be seen below:

Concurrency Succeded (rate) Failed
5 6264 (5Hz) 0
20 10510 (9Hz) 0
50 10324 (9Hz) 0
100 716 (0.5Hz) 0
150 300 (<0.5Hz) 0
200 400 (<0.5Hz) 0
300 600 (<0.5Hz) 0

The tested system peaked at about 9Hz with 50 clients. After that level it became painfully slow, basically unusable, although no user visible errors could be noticed.

It should be noted that once the unstable situation was reached, the system remained very slow even with a small number of clients, until a human operator fixed the problem on the server side.

Run 2

This run ran against dCache v1.5.9-12 with PNFS. Default parameters were used.

The glideTester jobs were configured to run for 20 minutes (1200s) up to 200 jobs, and for 40 minutes (2400s) for higher concurrencies.

Complete results can be seen below:

Concurrency Succeded (rate) Failed
5 2191 (2Hz) 0
20 2230 (2Hz) 0
50 2243 (2Hz) 0
100 2270 (2Hz) 0
150 2307 (2Hz) 0
200 2312 (2Hz) 0
300 4607 (2Hz) 0
400 4618 (2Hz) 0
500 4676 (2Hz) 0
600 4790 (2Hz) 0
700 4709 (2Hz) 1280
800 4899 (2Hz) 4722
900 2645 (1Hz) 8011
1000 2625 (1Hz) 11847

The tested system was pretty consistent at 2Hz up to 800 clients. However, starting at 700 clients users start to see errors.

-- IgorSfiligoi - 2010/05/07

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback