Difference: WSGramTests (2 vs. 3)

Revision 32008/01/18 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

WS Gram Testing

Line: 83 to 83
  uaf-2-load-2kx2.png
Added:
>
>

Followup WS GRAM 2000 x 2 submitters 5%+ Hold result

This test resulted in a greater than 5% job hold rate and excessive gatekeeper load. Errors included problems with authentication.

  • Condor Load 2kx2 WS GRAM Test 2:
    osg-gw-5-CondorLoad-2kx2-2.png

  • OSG GW5 Load 2kx2 Test 2:
    osg-gw-5-load-2kx2-2.png

  • OSG GW5 Mem 2kx2 Test 2:
    osg-gw-5-mem-2kx2-2.png
 

Pre-WS GRAM Comparision Test 2000 Jobs Submitted from 2 Submitters (4K Jobs)

Added:
>
>
  • Of particular note is that rate of submission of Pre-WS GRAM jobs is approximated 1Hz. (1380/1356 jobs ~= 1.0Hz)
 

Notes

  • With increasing jobs queued (within acceptable levels for pre-ws gram) about 5% of jobs become held for one reason or another. Configuration changes to timeouts have reduced the variety of timeouts.
Changed:
<
<
  • One of the more recent tests (2K x 1 submitter) resulted in 0 held jobs, however there are two jobs that never even started. A hold and release cycle got them started again. A 2k x 2 submitter test needs to be done to follow up.
>
>
  • One of the more recent tests (2K x 1 submitter) resulted in 0 held jobs, however there are two jobs that never even started. A hold and release cycle got them started again.

  • In a followup 2k x 2 submitter test about 2 hours in 5% of the jobs (204/4000) had gone into various hold states.

HoldReason = "Globus error: GT4_GRAM_JOB_SUBMIT timed out"
...
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
...
HoldReason = "Globus error: GT4_GRAM_JOB_SUBMIT timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_SUBMIT timed out"
HoldReason = "Globus error: GT4_GRAM_JOB_SUBMIT timed out"
LastHoldReason = "Spooling input data files"
HoldReason = "Globus error 155: the job manager could not stage out a file"
HoldReason = "Globus error 155: the job manager could not stage out a file"
HoldReason = "Globus error 155: the job manager could not stage out a file"
HoldReason = "Globus error 155: the job manager could not stage out a file"
...
HoldReason = "Globus error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: \"/DC=org/DC=doegrids/OU=People/CN=Terr
ence Martin 525658\" is not authorized to use operation: {http://www.globus.org/namespaces/2004/10/gram/job}createManagedJob on this service"
HoldReason = "Globus error: GT4_GRAM_JOB_DESTROY timed out"
HoldReason = "Globus error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: \"/DC=org/DC=doegrids/OU=People/CN=Terr
ence Martin 525658\" is not authorized to use operation: {http://www.globus.org/namespaces/2004/10/gram/job}createManagedJob on this service"
 
  • Configuration changes to the submitter have resulted in Gridmanager java related process growing to greater than 1.1GB of used memory size. Each user that submits from a specific host will have their own java process. This is just the current largest Java process size. It may be necessary to allow Java to consume even more memory if > 2K jobs are submitted.
Line: 122 to 165
 
META FILEATTACHMENT attr="" autoattached="1" comment="OSG GW 5 Memory 1K" date="1200614548" name="gw5-mem.png" path="gw5-mem.png" size="24426" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="OSG 5 Load WS 2K x 2" date="1200622716" name="osg-5-load-2kx2.png" path="osg-5-load-2kx2.png" size="33458" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="GUMS Load WS 2K x 2" date="1200622675" name="gums-load-2kx2.png" path="gums-load-2kx2.png" size="34877" user="Main.TerrenceMartin" version="1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Condor Load 2kx2 WS GRAM Test 2" date="1200627888" name="osg-gw-5-CondorLoad-2kx2-2.png" path="osg-gw-5-CondorLoad-2kx2-2.png" size="31404" user="Main.TerrenceMartin" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="UAF 1 Load 2K" date="1200614946" name="uaf-1-load-2k.png" path="uaf-1-load-2k.png" size="31368" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="OSG 5 Condor Load 2K" date="1200614808" name="osg-5-condor-load-2K.png" path="osg-5-condor-load-2K.png" size="30651" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="OSG 5 Load 1K" date="1200614220" name="gw5-load.png" path="gw5-load.png" size="27977" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="UAF Load 1K" date="1200614420" name="uaf-1-load.png" path="uaf-1-load.png" size="25876" user="Main.TerrenceMartin" version="1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="OSG GW5 Load 2kx2 Test 2" date="1200627926" name="osg-gw-5-load-2kx2-2.png" path="osg-gw-5-load-2kx2-2.png" size="37049" user="Main.TerrenceMartin" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="UAF 2 Load WS 2K x 2" date="1200622804" name="uaf-2-load-2kx2.png" path="uaf-2-load-2kx2.png" size="32033" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="UAF 1 Memory 2K" date="1200614956" name="uaf-1-mem-2k.png" path="uaf-1-mem-2k.png" size="27913" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="OSG 5 Mem 2K" date="1200614827" name="osg-5-mem-2k.png" path="osg-5-mem-2k.png" size="29810" user="Main.TerrenceMartin" version="1"
META FILEATTACHMENT attr="" autoattached="1" comment="GUMS Load 2K" date="1200615000" name="Gums-Load-2K.png" path="Gums-Load-2K.png" size="35674" user="Main.TerrenceMartin" version="1"
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="OSG GW5 Mem 2kx2 Test 2" date="1200627955" name="osg-gw-5-mem-2kx2-2.png" path="osg-gw-5-mem-2kx2-2.png" size="30078" user="Main.TerrenceMartin" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="OSG GW 5 CE 1K" date="1200614144" name="gw5-condor.png" path="gw5-condor.png" size="27117" user="Main.TerrenceMartin" version="1"
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback