Difference: CMSLoadTestLHCONE (1 vs. 17)

Revision 172011/06/02 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 84 to 84
 

Results

Changed:
<
<

Detailed Results

>
>
Summary table of results of the tests are given here.
 
Changed:
<
<
Detailed results of the tests are given here.
>
>
Daniele's log is here.
 

Questions

Revision 162011/06/02 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 23 to 23
  A list of 13 sites which will be tested in the first phase is given in the table.
Changed:
<
<
Site Name Subscriptions Injections Activity (1h) Transfer Queue
T2_BE_IIHE Debug Prod Debug Debug Prod Debug Prod
T2_DE_DESY Debug Prod Debug Debug Prod Debug Prod
T2_DE_RWTH Debug Prod Debug Debug Prod Debug Prod
T2_ES_IFCA Debug Prod Debug Debug Prod Debug Prod
T2_FR_GRIF_LLR Debug Prod Debug Debug Prod Debug Prod
T2_IN_TIFR        
T2_IT_Legnaro Debug Prod Debug Debug Prod Debug Prod
T2_IT_Pisa Debug Prod Debug Debug Prod Debug Prod
T2_RU_RRC_KI Debug Prod Debug Debug Prod Debug Prod
T2_UK_London_IC Debug Prod Debug Debug Prod Debug Prod
T2_US_MIT Debug Prod Debug Debug Prod Debug Prod
T2_US_Purdue Debug Prod Debug Debug Prod Debug Prod
T2_US_Wisconsin Debug Prod Debug Debug Prod Debug Prod
>
>
Site Name Subscriptions Injections Activity (1h) Transfer Queue NREN/OPN (Gbps)
T2_BE_IIHE Debug Prod Debug Debug Prod Debug Prod 1.0
T2_DE_DESY Debug Prod Debug Debug Prod Debug Prod 4.0
T2_DE_RWTH Debug Prod Debug Debug Prod Debug Prod 10.0
T2_ES_IFCA Debug Prod Debug Debug Prod Debug Prod 2.0
T2_FR_GRIF_LLR Debug Prod Debug Debug Prod Debug Prod 5.0/10.0
T2_IN_TIFR         1.0
T2_IT_Legnaro Debug Prod Debug Debug Prod Debug Prod 2.0
T2_IT_Pisa Debug Prod Debug Debug Prod Debug Prod 2.0
T2_RU_RRC_KI Debug Prod Debug Debug Prod Debug Prod 1.0
T2_UK_London_IC Debug Prod Debug Debug Prod Debug Prod 10.0
T2_US_MIT Debug Prod Debug Debug Prod Debug Prod 10.0
T2_US_Purdue Debug Prod Debug Debug Prod Debug Prod 20.0
T2_US_Wisconsin Debug Prod Debug Debug Prod Debug Prod 10.0
 

Status of Data Transfer Links

Line: 97 to 97
 
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1. Somewhat complicated by personnel issues.
Changed:
<
<
-- JamesLetts - 2011/05/25
>
>
-- JamesLetts - 2011/06/01

Revision 152011/06/01 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 62 to 62
 

Suspend other transfers

Changed:
<
<
In general, we would like a stable transfer environment to run the test. This means that other Debug transfers to and from the sites being tested are suspended, and any significant production transfers are also temporarily halted. Using the PhEDEx web interface via the links above:
  • STOP all LoadTest file injections from the two sites, except the one between them in the direction you wish to test.
  • SUSPEND all LoadTest subscriptions to the two sites, except the one between them in the direction you wish to test.
  • CHECK if there are significant unfinished Production transfers to the two sites. You can tell if a subscription is incomplete if the value under the "% Bytes" column is red and is not "100.0%". Its not really possible to suspend them any time soon once the files are in the transfer queue. If there are significant production transfers, consider doing this test later.
  • WAIT for the remaining Debug transfers in progress to complete, usually on the order of 1 hour. Transfer activity in the past hour can be checked by the Activity links above.
>
>
In general, we would like a stable transfer environment to run the test. This means that other Debug transfers to and from the sites being tested are suspended, and any significant production transfers are not taking place. Using the PhEDEx web interface via the links above:
  • STOP all LoadTest file injections from the two sites, except the one between them in the direction you wish to test (with the exception of critical links such as to the Tier-1 sites).
  • SUSPEND all LoadTest subscriptions to the two sites, except the one between them in the direction you wish to test (with the exception of critical links such as from the Tier-1 sites).
  • CHECK if there are significant unfinished Production or Debug transfers to the two sites. You can tell if a subscription is incomplete if the value under the "% Bytes" column is red and is not "100.0%". Its not really possible to suspend transfers once the files are in the transfer queue. If there are significant Production (or Debug) transfers, consider doing this test later.
 

Inject LoadTest files

Line: 85 to 84
 

Results

Deleted:
<
<

Maximum Transfer Rate

Maximum transfer rate over one hour between sites (from site in left column to site in top row) in MiB/s.

T2_BE_IIHE T2_DE_DESY T2_DE_RWTH T2_ES_IFCA T2_FR_GRIF_LLR T2_IN_TIFR T2_IT_Legnaro T2_IT_Pisa T2_RU_RRC_KI T2_UK_London_IC T2_US_MIT T2_US_Purdue T2_US_Wisconsin
T2_BE_IIHE                          
T2_DE_DESY                          
T2_DE_RWTH         108.3           261.4    
T2_ES_IFCA                          
T2_FR_GRIF_LLR     448.9                    
T2_IN_TIFR                          
T2_IT_Legnaro                          
T2_IT_Pisa             136.9            
T2_RU_RRC_KI                          
T2_UK_London_IC                       115.9  
T2_US_MIT                          
T2_US_Purdue                          
T2_US_Wisconsin                     329.6    
 

Detailed Results

Detailed results of the tests are given here.

Revision 142011/05/27 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 97 to 97
 
T2_FR_GRIF_LLR     448.9                    
T2_IN_TIFR                          
T2_IT_Legnaro                          
Changed:
<
<
T2_IT_Pisa                          
>
>
T2_IT_Pisa             136.9            
 
T2_RU_RRC_KI                          
T2_UK_London_IC                       115.9  
T2_US_MIT                          

Revision 132011/05/26 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 51 to 51
 
  • T2_US_Wisconsin dealing with storage element instabilities 2011/05/25
  • T2_IN_TIFR upgrading dpm 2011/05/25
  • T2_US_Purdue has a huge (>10K files) transfer queue in prod 2011/05/25
Added:
>
>
  • T2_DE_RWTH all transfers from site fail with AsyncWait? error. 2011/05/26
  • T2_US_MIT proxy expired, all transfers to site fail. 2011/05/26
 

Detailed Procedures

Line: 90 to 92
 
T2_BE_IIHE T2_DE_DESY T2_DE_RWTH T2_ES_IFCA T2_FR_GRIF_LLR T2_IN_TIFR T2_IT_Legnaro T2_IT_Pisa T2_RU_RRC_KI T2_UK_London_IC T2_US_MIT T2_US_Purdue T2_US_Wisconsin
T2_BE_IIHE                          
T2_DE_DESY                          
Changed:
<
<
T2_DE_RWTH         108.3                
>
>
T2_DE_RWTH         108.3           261.4    
 
T2_ES_IFCA                          
T2_FR_GRIF_LLR     448.9                    
T2_IN_TIFR                          

Revision 122011/05/25 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 46 to 46
 
    • Commissioning activity for T2_IN_TIFR can be found here.
  • Current Debug link status between these sites can be found here. Links will show red if the agents are down, e.g.
Added:
>
>

Site Issues

  • T2_US_Wisconsin dealing with storage element instabilities 2011/05/25
  • T2_IN_TIFR upgrading dpm 2011/05/25
  • T2_US_Purdue has a huge (>10K files) transfer queue in prod 2011/05/25
 

Detailed Procedures

All data transfer links should transfer a few files per day in the LoadTest. Check the link in question to see if it is actually working by looking at the Debug instance activity by filling in the appropriate "To" and "From" sites here. There is no point to test a broken link, or a link that has more transfer errors than successes.

Line: 77 to 83
 

Results

Added:
>
>

Maximum Transfer Rate

Maximum transfer rate over one hour between sites (from site in left column to site in top row) in MiB/s.

T2_BE_IIHE T2_DE_DESY T2_DE_RWTH T2_ES_IFCA T2_FR_GRIF_LLR T2_IN_TIFR T2_IT_Legnaro T2_IT_Pisa T2_RU_RRC_KI T2_UK_London_IC T2_US_MIT T2_US_Purdue T2_US_Wisconsin
T2_BE_IIHE                          
T2_DE_DESY                          
T2_DE_RWTH         108.3                
T2_ES_IFCA                          
T2_FR_GRIF_LLR     448.9                    
T2_IN_TIFR                          
T2_IT_Legnaro                          
T2_IT_Pisa                          
T2_RU_RRC_KI                          
T2_UK_London_IC                       115.9  
T2_US_MIT                          
T2_US_Purdue                          
T2_US_Wisconsin                     329.6    

Detailed Results

 Detailed results of the tests are given here.

Questions

Line: 88 to 117
 
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1. Somewhat complicated by personnel issues.
Deleted:
<
<
-- JamesLetts - 2011/05/04

-- JamesLetts - 2011/05/19

 \ No newline at end of file
Added:
>
>
-- JamesLetts - 2011/05/25

Revision 112011/05/19 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 57 to 57
 In general, we would like a stable transfer environment to run the test. This means that other Debug transfers to and from the sites being tested are suspended, and any significant production transfers are also temporarily halted. Using the PhEDEx web interface via the links above:
  • STOP all LoadTest file injections from the two sites, except the one between them in the direction you wish to test.
  • SUSPEND all LoadTest subscriptions to the two sites, except the one between them in the direction you wish to test.
Changed:
<
<
  • CHECK if there are significant unfinished Production subscriptions to the two sites, and suspend them temporarily if necessary. You can tell if a subscription is incomplete if the value under the "% Bytes" column is red and is not "100.0%".
  • We don't currently try to stop Production instance file exports from the sites (this can most easily be done by turning off the export PhEDEx agent by the site admin), nor can we control SRM activity from CRAB.
  • WAIT for the remaining transfers in progress to complete, usually on the order of 1 hour. Transfer activity in the past hour can be checked by the Activity links above.
>
>
  • CHECK if there are significant unfinished Production transfers to the two sites. You can tell if a subscription is incomplete if the value under the "% Bytes" column is red and is not "100.0%". Its not really possible to suspend them any time soon once the files are in the transfer queue. If there are significant production transfers, consider doing this test later.
  • WAIT for the remaining Debug transfers in progress to complete, usually on the order of 1 hour. Transfer activity in the past hour can be checked by the Activity links above.
 

Inject LoadTest files

Revision 102011/05/19 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 21 to 21
 

List of CMS Tier-2 Sites in First Stage of LHCONE

Changed:
<
<
A list of 14 sites which will be tested in the first phase is given in the table.
>
>
A list of 13 sites which will be tested in the first phase is given in the table.
 
Site Name Subscriptions Injections Activity (1h) Transfer Queue
T2_BE_IIHE Debug Prod Debug Debug Prod Debug Prod
Line: 91 to 91
 

-- JamesLetts - 2011/05/04

Changed:
<
<
-- JamesLetts - 2011/05/18
>
>
-- JamesLetts - 2011/05/19

Revision 92011/05/18 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 19 to 19
  If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure or usually with any publicly available interface. However, if the tests are run over a sufficiently long time period (several hours), then any bursts in user or production traffic should still allow a good measurement of the maximum transfer rate.
Deleted:
<
<

Procedures

To benchmark test a single link between a source and destination site:

  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
  • Suspend LoadTest file injection at both the source and destination sites.
  • If there are significant production instance PhEDEx transfers from the site, they can be suspended too (a lot of work).
  • For the link you want to test, set the injection rate in the LoadTest to zero.

  • After things have quieted down and there is little residual PhEDEx traffic in or out of either site, inject a block of LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
  • Set the injection rate to some level 0.5MB/s for FTS-based tests and check the transfer success rate ("Quality"). Or just take the transfer quality from the above test.
  • After the benchmark test, return the sites to the status quo ante.
 

List of CMS Tier-2 Sites in First Stage of LHCONE

Changed:
<
<
A list of 14 sites which will be tested in the first phase is given in the table. The current LoadTest activity between these sites is given here.
>
>
A list of 14 sites which will be tested in the first phase is given in the table.
 
Site Name Subscriptions Injections Activity (1h) Transfer Queue
Changed:
<
<
T2_BE_IIHE        
T2_DE_RWTH        
T2_DE_DESY        
T2_ES_IFCA        
T2_FR_GRIF_IRFU        
T2_FR_GRIF_LLR        
T2_IN_TIFR        
T2_IT_Legnaro Debug Prod On/Off Debug Prod Debug Prod
T2_IT_Pisa        
T2_RU_RRC_KI Debug Prod On/Off Debug Prod Debug Prod
T2_UK_London_IC Debug Prod On/Off Debug Prod Debug Prod
T2_US_MIT Debug Prod On/Off Debug Prod Debug Prod
T2_US_Purdue Debug Prod On/Off Debug Prod Debug Prod
T2_US_Wisconsin Debug Prod On/Off Debug Prod Debug Prod
>
>
T2_BE_IIHE Debug Prod Debug Debug Prod Debug Prod
T2_DE_DESY Debug Prod Debug Debug Prod Debug Prod
T2_DE_RWTH Debug Prod Debug Debug Prod Debug Prod
T2_ES_IFCA Debug Prod Debug Debug Prod Debug Prod
T2_FR_GRIF_LLR Debug Prod Debug Debug Prod Debug Prod
T2_IN_TIFR        
T2_IT_Legnaro Debug Prod Debug Debug Prod Debug Prod
T2_IT_Pisa Debug Prod Debug Debug Prod Debug Prod
T2_RU_RRC_KI Debug Prod Debug Debug Prod Debug Prod
T2_UK_London_IC Debug Prod Debug Debug Prod Debug Prod
T2_US_MIT Debug Prod Debug Debug Prod Debug Prod
T2_US_Purdue Debug Prod Debug Debug Prod Debug Prod
T2_US_Wisconsin Debug Prod Debug Debug Prod Debug Prod

Status of Data Transfer Links

  • The current LoadTest activity between these sites is given here.
  • Current DDT Commissioning status in the Production instance of PhEDEx between these sites can be found here. Note that Tier-2 links to and from T2_IN_TIFR and T2_FR_GRIF_IRFU are in general not commissioned.
    • Links to and from T2_FR_GRIF_IRFU cannot be commissioned at this time due to network bandwidth limitations. See the savannah ticket.
    • Commissioning activity for T2_IN_TIFR can be found here.
  • Current Debug link status between these sites can be found here. Links will show red if the agents are down, e.g.

Detailed Procedures

All data transfer links should transfer a few files per day in the LoadTest. Check the link in question to see if it is actually working by looking at the Debug instance activity by filling in the appropriate "To" and "From" sites here. There is no point to test a broken link, or a link that has more transfer errors than successes.

If there are no transfer attempts, then one will have to investigate why not before benchmark testing. Common causes include sites or links which are down (check here), suspended subscriptions (check here) or stopped file injections (check here).

Suspend other transfers

In general, we would like a stable transfer environment to run the test. This means that other Debug transfers to and from the sites being tested are suspended, and any significant production transfers are also temporarily halted. Using the PhEDEx web interface via the links above:

  • STOP all LoadTest file injections from the two sites, except the one between them in the direction you wish to test.
  • SUSPEND all LoadTest subscriptions to the two sites, except the one between them in the direction you wish to test.
  • CHECK if there are significant unfinished Production subscriptions to the two sites, and suspend them temporarily if necessary. You can tell if a subscription is incomplete if the value under the "% Bytes" column is red and is not "100.0%".
  • We don't currently try to stop Production instance file exports from the sites (this can most easily be done by turning off the export PhEDEx agent by the site admin), nor can we control SRM activity from CRAB.
  • WAIT for the remaining transfers in progress to complete, usually on the order of 1 hour. Transfer activity in the past hour can be checked by the Activity links above.

Inject LoadTest files

Next we inject files for the benchmark test, and record the testing information:

  • After things have quieted down and there is little residual PhEDEx traffic in or out of either site, a block of LoadTest files at the source site using the link above. For a transfer link fully over a 10Gbps network connection end-to-end, 1000 files is sufficient for a few hours of transfers. For sites connected to 1Gbps links or less, a few hundred files should be sufficient. Note that the DDT commissioning metric, which all links passed at some point, was to transfer in less than 24 hours a total of 421GiB, or approximately 168 files of 2.5GiB size.
  • Record in the results twiki the number of files injected over the link and the start time (UNIX time since epoch is most useful).

The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over one hour will give the maximum transfer rate benchmark. There is a python [[http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/CMSLoadTestLHCONEScripts][script] under development to extract these values from the PhEDEx data service. Record the results in the results twiki page.

Cleaning up

After the benchmark test, return the sites to the status quo ante.

  • Un-suspend any Production or Debug transfers at both sites.
  • Start all injections from the sites.
  • On the tested link, set the injection rate to 0.5MB/s for FTS-based tests to be done later.
 

Results

Line: 64 to 88
 
  • 3 sites are not using the FTS backend of the PhEDEx download agent for some transfers:
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1. Somewhat complicated by personnel issues.
Added:
>
>
 -- JamesLetts - 2011/05/04
Added:
>
>
-- JamesLetts - 2011/05/18

Revision 82011/05/17 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 17 to 17
 
  • Latency of transferring a complete "block" of files (seconds for n files)
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
Changed:
<
<
If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure or usually with any publicly available interface. However, if the tests are run over a sufficiently long time period (several hours), then any bursts in user or production traffic should still allow a good measurement of the maximum transfer rate. ---++ Procedures
>
>
If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure or usually with any publicly available interface. However, if the tests are run over a sufficiently long time period (several hours), then any bursts in user or production traffic should still allow a good measurement of the maximum transfer rate.

Procedures

  To benchmark test a single link between a source and destination site:
  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
Line: 29 to 31
 
  • Set the injection rate to some level 0.5MB/s for FTS-based tests and check the transfer success rate ("Quality"). Or just take the transfer quality from the above test.
  • After the benchmark test, return the sites to the status quo ante.
Deleted:
<
<
How to measure the latency using the PhEDEx data service?
 

List of CMS Tier-2 Sites in First Stage of LHCONE

A list of 14 sites which will be tested in the first phase is given in the table. The current LoadTest activity between these sites is given here.

Line: 43 to 43
 
T2_FR_GRIF_IRFU        
T2_FR_GRIF_LLR        
T2_IN_TIFR        
Changed:
<
<
T2_IT_Legnaro Debug On/Off Prod Debug  
>
>
T2_IT_Legnaro Debug Prod On/Off Debug Prod Debug Prod
 
T2_IT_Pisa        
Changed:
<
<
T2_RU_RRC_KI Debug On/Off Debug Prod  
T2_UK_London_IC Debug On/Off Debug Prod  
>
>
T2_RU_RRC_KI Debug Prod On/Off Debug Prod Debug Prod
T2_UK_London_IC Debug Prod On/Off Debug Prod Debug Prod
 
T2_US_MIT Debug Prod On/Off Debug Prod Debug Prod
T2_US_Purdue Debug Prod On/Off Debug Prod Debug Prod
T2_US_Wisconsin Debug Prod On/Off Debug Prod Debug Prod

Revision 72011/05/14 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 21 to 21
  To benchmark test a single link between a source and destination site:
  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
Changed:
<
<
  • Suspend LoadTest file injection at both the source and destination sites. (How to handle Production exports?)
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
>
>
  • Suspend LoadTest file injection at both the source and destination sites.
  • If there are significant production instance PhEDEx transfers from the site, they can be suspended too (a lot of work).
  • For the link you want to test, set the injection rate in the LoadTest to zero.

  • After things have quieted down and there is little residual PhEDEx traffic in or out of either site, inject a block of LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
 
  • Set the injection rate to some level 0.5MB/s for FTS-based tests and check the transfer success rate ("Quality"). Or just take the transfer quality from the above test.
  • After the benchmark test, return the sites to the status quo ante.
Added:
>
>
How to measure the latency using the PhEDEx data service?
 

List of CMS Tier-2 Sites in First Stage of LHCONE

A list of 14 sites which will be tested in the first phase is given in the table. The current LoadTest activity between these sites is given here.

Revision 62011/05/13 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 17 to 17
 
  • Latency of transferring a complete "block" of files (seconds for n files)
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
Changed:
<
<
If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure. Therefore, to measure whether there is a consistent networking load to and from the sites in question during benchmarking tests, it will be necessary to have some access to the local sites' network monitoring and define a range of traffic that is consistent with a low load on the site, for example. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period.

Procedures

>
>
If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure or usually with any publicly available interface. However, if the tests are run over a sufficiently long time period (several hours), then any bursts in user or production traffic should still allow a good measurement of the maximum transfer rate. ---++ Procedures
  To benchmark test a single link between a source and destination site:
Deleted:
<
<
 
  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
Changed:
<
<
  • Suspend LoadTest file injection at both the source and destination sites.
  • Check local site network monitoring to see there is little (limit to be defined?) other network traffic during the test.
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of (how many?) LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
  • Set the injection rate to some level (need to define) and check the transfer success rate ("Quality") if needed. Or just take the transfer quality from the above test?
>
>
  • Suspend LoadTest file injection at both the source and destination sites. (How to handle Production exports?)
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
  • Set the injection rate to some level 0.5MB/s for FTS-based tests and check the transfer success rate ("Quality"). Or just take the transfer quality from the above test.
 
  • After the benchmark test, return the sites to the status quo ante.
Deleted:
<
<
To define:
  • Limit for acceptable other network traffic for a valid benchmark test.
  • Number of files to inject for the benchmark test. Should be set high enough to give several hours (6?) of transfers at the maximum possible rate, but not too many hours. Therefore, we need an estimate of the maximum rate before making the test.
  • Injection rate for transfer quality test. 5MB/s?
 

List of CMS Tier-2 Sites in First Stage of LHCONE

Changed:
<
<
A list of sites which will be tested, with their SRM endpoints and network monitoring url:

SITE SRM Network Monitor URL Est. Max Rate
T2_US_MIT se01.cmsaf.mit.edu Site 10 Gpbs
T2_US_Caltech cit-se.ultralight.org Site 10 Gbps
T2_US_UCSD bsrm-1.t2.ucsd.edu Site, Network 10 Gbps

Results

Neither Endpoint Connected to LHCONE

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %
>
>
A list of 14 sites which will be tested in the first phase is given in the table. The current LoadTest activity between these sites is given here.
 
Changed:
<
<

Source Endpoint Connected to LHCONE only

>
>
Site Name Subscriptions Injections Activity (1h) Transfer Queue
T2_BE_IIHE        
T2_DE_RWTH        
T2_DE_DESY        
T2_ES_IFCA        
T2_FR_GRIF_IRFU        
T2_FR_GRIF_LLR        
T2_IN_TIFR        
T2_IT_Legnaro Debug On/Off Prod Debug  
T2_IT_Pisa        
T2_RU_RRC_KI Debug On/Off Debug Prod  
T2_UK_London_IC Debug On/Off Debug Prod  
T2_US_MIT Debug Prod On/Off Debug Prod Debug Prod
T2_US_Purdue Debug Prod On/Off Debug Prod Debug Prod
T2_US_Wisconsin Debug Prod On/Off Debug Prod Debug Prod
 
Deleted:
<
<
SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %
 
Changed:
<
<

Destination Endpoint Connected to LHCONE only

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %

Both Endpoints Connected to LHCONE

>
>

Results

 
Changed:
<
<
SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %
>
>
Detailed results of the tests are given here.
 

Questions

Revision 52011/05/13 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Revision 42011/05/10 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 21 to 21
 

Procedures

Changed:
<
<
To benchmark test a single link between a source and destination site (add some links here, perhaps to automate it all?):
>
>
To benchmark test a single link between a source and destination site:
 
Changed:
<
<
  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
  • Suspend LoadTest file injection at both the source and destination sites.
>
>
  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
  • Suspend LoadTest file injection at both the source and destination sites.
 
  • Check local site network monitoring to see there is little (limit to be defined?) other network traffic during the test.
Changed:
<
<
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of (how many?) LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
>
>
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of (how many?) LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
 
  • Set the injection rate to some level (need to define) and check the transfer success rate ("Quality") if needed. Or just take the transfer quality from the above test?
  • After the benchmark test, return the sites to the status quo ante.
Line: 41 to 41
 A list of sites which will be tested, with their SRM endpoints and network monitoring url:

SITE SRM Network Monitor URL Est. Max Rate
Changed:
<
<
T2_US_MIT se01.cmsaf.mit.edu http://??? 10 Gpbs
    http://???  
>
>
T2_US_MIT se01.cmsaf.mit.edu Site 10 Gpbs
T2_US_Caltech cit-se.ultralight.org Site 10 Gbps
T2_US_UCSD bsrm-1.t2.ucsd.edu Site, Network 10 Gbps
 

Results

Revision 32011/05/09 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 14 to 14
  CMS would like to measure the effect (if any) of moving sites to the LHCONE network compared to current operations. To this end, we need measurable benchmarks of data transfer performance. Using the LoadTest infrastructure of PhEDEx. several benchmarks are easy to calculate from information stored in TMDB and available through the PhEDEx web interface or data service:
  • Maximum transfer rate over a time interval
Deleted:
<
<
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
 
  • Latency of transferring a complete "block" of files (seconds for n files)
Added:
>
>
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
 
Changed:
<
<
In LoadTests, there are two testing cases:
  • Multiple data transfer links are active to or from a site or pair of sites.
  • Only a single active transfer link is active for both the source and destination site.
>
>
If we are to test the performance of a data transfer link between two SRM endpoints, it would be best to perform the test under a consistent networking load. For example, at a Tier-2 site there are Production data transfers to and from many possible Tier-1, Tier-2 and Tier-3 sites, Debug LoadTest data transfers, as well as production and user analysis job traffic both to and from the site. The latter is particularly impossible to control without shutting down the site, and is not measurable with the PhEDEx infrastructure. Therefore, to measure whether there is a consistent networking load to and from the sites in question during benchmarking tests, it will be necessary to have some access to the local sites' network monitoring and define a range of traffic that is consistent with a low load on the site, for example. Production and other Debug PhEDEx data transfers are easy to deal with and can be suspended during the testing period.
 

Procedures

Added:
>
>
To benchmark test a single link between a source and destination site (add some links here, perhaps to automate it all?):

  • Suspend all Production and Debug subscriptions to the source and destination sites, except the Debug subscription over the path you wish to test.
  • Suspend LoadTest file injection at both the source and destination sites.
  • Check local site network monitoring to see there is little (limit to be defined?) other network traffic during the test.
  • After things have quieted down and there is little residual network traffic in or out of either site, inject a block of (how many?) LoadTest files at the source. The time to complete the transfer of the files will give a latency number, and the maximum sustained transfer rate over a time period (1h is natural) will give the maximum transfer rate benchmark.
  • Set the injection rate to some level (need to define) and check the transfer success rate ("Quality") if needed. Or just take the transfer quality from the above test?
  • After the benchmark test, return the sites to the status quo ante.

To define:

  • Limit for acceptable other network traffic for a valid benchmark test.
  • Number of files to inject for the benchmark test. Should be set high enough to give several hours (6?) of transfers at the maximum possible rate, but not too many hours. Therefore, we need an estimate of the maximum rate before making the test.
  • Injection rate for transfer quality test. 5MB/s?
 

List of CMS Tier-2 Sites in First Stage of LHCONE

Added:
>
>
A list of sites which will be tested, with their SRM endpoints and network monitoring url:

SITE SRM Network Monitor URL Est. Max Rate
T2_US_MIT se01.cmsaf.mit.edu http://??? 10 Gpbs
    http://???  

Results

Neither Endpoint Connected to LHCONE

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %

Source Endpoint Connected to LHCONE only

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %

Destination Endpoint Connected to LHCONE only

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %

Both Endpoints Connected to LHCONE

SOURCE SITE DESTINATION SITE Files Injected Max. Rate / 1h Latency (h) Quality (%)
T2_US_MIT     MB/s h %
T2_US_MIT     MB/s h %
 

Questions

Line: 33 to 77
 
  • 7 out of 10 sites are using only FTS for file transfers in the Debug instance of PhEDEx: T2_BR_UERJ, T2_US_Caltech, T2_US_MIT, T2_US_Nebraska, T2_US_UCSD, T2_US_Vanderbilt, T2_US_Wisconsin.
  • 3 sites are not using the FTS backend of the PhEDEx download agent for some transfers:
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
Changed:
<
<
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1.
>
>
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1. Somewhat complicated by personnel issues.
 -- JamesLetts - 2011/05/04

Revision 22011/05/05 - Main.JamesLetts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Line: 6 to 6
 

Introduction

Changed:
<
<
Something about LHCONE commissioning...
>
>

The LHC Open Network Environment

 
Added:
>
>
The objective of LHCONE is to provide a collection of access locations that are effectively entry points into a network that is private to the LHC T1/2/3 sites. LHCONE is not intended to replace the LHCOPN but rather to complement it. Up until now, T1-T2, T2-T2, and T3 data movements have been using the shared General Purpose Network infrastructure. LHCONE is a robust and scalable solution for a global system serving LHC’s T1, T2 and T3 sites’ needs and fits the new less-hierarchical computing models.
 

Benchmarks

Changed:
<
<
Several benchmarks are easy to calculate from information stored in TMDB and available through the PhEDEx web interface or data service:
>
>
CMS would like to measure the effect (if any) of moving sites to the LHCONE network compared to current operations. To this end, we need measurable benchmarks of data transfer performance. Using the LoadTest infrastructure of PhEDEx. several benchmarks are easy to calculate from information stored in TMDB and available through the PhEDEx web interface or data service:
 
  • Maximum transfer rate over a time interval
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
  • Latency of transferring a complete "block" of files (seconds for n files)
Changed:
<
<
  • Single active link vs. multiple active links to or from the source or destination sites?
>
>
In LoadTests, there are two testing cases:
  • Multiple data transfer links are active to or from a site or pair of sites.
  • Only a single active transfer link is active for both the source and destination site.
 

Procedures

Line: 29 to 31
 Are OSG Tier-2 sites still using srm copy clients instead of FTS?

  • 7 out of 10 sites are using only FTS for file transfers in the Debug instance of PhEDEx: T2_BR_UERJ, T2_US_Caltech, T2_US_MIT, T2_US_Nebraska, T2_US_UCSD, T2_US_Vanderbilt, T2_US_Wisconsin.
Changed:
<
<
  • 3 sites are using srm copy clients for some transfers:
>
>
  • 3 sites are not using the FTS backend of the PhEDEx download agent for some transfers:
 
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
Changed:
<
<
    • T2_US_Florida: Site is running 7 download agents. Using srm-copy client from OSG sites and some European sites (Vienna), and lcg-cp from other sites. Will have to ping the admin to find out what they are really using.
>
>
    • T2_US_Florida: Site is running 7 download agents. Uses FTS backend for transfers from most Tier-1 sites, but uses SRM backend with srm-copy client for OSG sites, Vienna, and the Tier-1 in Taiwan. They use SRM backend with lcg-cp for all other source sites including the German Tier-1.
  -- JamesLetts - 2011/05/04

Revision 12011/05/04 - Main.JamesLetts

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Benchmark Tests Using the CMS PhEDEx LoadTest Infrastructure

Introduction

Something about LHCONE commissioning...

Benchmarks

Several benchmarks are easy to calculate from information stored in TMDB and available through the PhEDEx web interface or data service:

  • Maximum transfer rate over a time interval
  • Transfer success percentage ("transfer quality") over a time interval at a particular injection rate
  • Latency of transferring a complete "block" of files (seconds for n files)

  • Single active link vs. multiple active links to or from the source or destination sites?

Procedures

List of CMS Tier-2 Sites in First Stage of LHCONE

Questions

Are OSG Tier-2 sites still using srm copy clients instead of FTS?

  • 7 out of 10 sites are using only FTS for file transfers in the Debug instance of PhEDEx: T2_BR_UERJ, T2_US_Caltech, T2_US_MIT, T2_US_Nebraska, T2_US_UCSD, T2_US_Vanderbilt, T2_US_Wisconsin.
  • 3 sites are using srm copy clients for some transfers:
    • T2_BR_SPRACE and T2_US_Purdue: for transfers from T2_US sites only
    • T2_US_Florida: Site is running 7 download agents. Using srm-copy client from OSG sites and some European sites (Vienna), and lcg-cp from other sites. Will have to ping the admin to find out what they are really using.

-- JamesLetts - 2011/05/04

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback