Difference: GlideinWMSCrab (11 vs. 12)

Revision 122009/04/15 - Main.SanjayPadhi

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

List of Issues(Logbook)

Line: 8 to 8
  2.
Changed:
<
<
2009-03-31 11:32:00,441:Registering information:
{'submittedJobs': None, 'SE-White': "['grid-srm.physik.rwth-aachen.de']", 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTaskBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 397, in submitTaskBlocks\n task = self.blSchedSession.submit(task[\'id\'], sub_jobs[ii], reqs_jobs[ii])\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/API/BossLiteAPISched.py", line 129, in submit\n self.scheduler.submit( task, requirements )\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/Scheduler/Scheduler.py", line 95, in submit\n job.runningJob[\'schedulerId\'] = jobAttributes[ job[\'name\'] ]\nKeyError: \'spadhi_crab_0_090331_202909_4p39kj_job114\'\n', 'skippedJobs': None, 'error': 'WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None, 'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-31 11:32:00,441:WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.
>
>
2009-03-31 11:32:00,441:Registering information:
{'submittedJobs': None, 'SE-White': "['grid-srm.physik.rwth-aachen.de']", 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTaskBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 397, in submitTaskBlocks\n task = self.blSchedSession.submit(task[\'id\'], sub_jobs[ii], reqs_jobs[ii])\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/API/BossLiteAPISched.py", line 129, in submit\n self.scheduler.submit( task, requirements )\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/Scheduler/Scheduler.py", line 95, in submit\n job.runningJob[\'schedulerId\'] = jobAttributes[ job[\'name\'] ]\nKeyError: \'spadhi_crab_0_090331_202909_4p39kj_job114\'\n', 'skippedJobs': None, 'error': 'WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None, 'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-31 11:32:00,441:WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.
  2009/03/21
Line: 33 to 33
 2009-03-20 18:04:23,639:Registering information:
{'submittedJobs': None, 'SE-White': None, 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/
cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTa
skBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/
lib/CrabServerWorker/FatWorker.py", line 378, in submitTaskBlocks\n self.SendMLpre(task)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_
gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 615, in SendMLpre? \n params = self.collect_MLInfo(task)\n File "/h
ome/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 647, in collect_ML
Info\n params = {\'tool\': \'crab\',\\\n File "/build/fvlingen/CMS_BUILD/comp-nightly-prodagent/w/slc4_ia32_gcc345/external/python/2.4.2-cmp4/lib/pyth
on2.4/UserDict.py", line 17, in __getitem__\n def __getitem__(self, key): return self.data[key]\nKeyError: \'HOSTNAME\'\n', 'skippedJobs': None, 'error
': 'WorkerError worker_0. Task spadhi_crab_0_090321_020302_3vfw41.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None,
'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77
, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]
', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce
02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}

2009/03/07

Deleted:
<
<

1. GCB Issues from the WN(Some examples)

 
Deleted:
<
<
HOSTNAME=n74.lcgwn.kiae, n18.lcgwn.kiae

CE = gate.grid.kiae.ru:2119/jobmanager-lcgpbs-cms

HOSTNAME=wn011.polgrid.pl

CE=ce.polgrid.pl:2119/jobmanager-lcgpbs-cms

HOSTNAME=gaew0213.ciemat.es, gaew0225.ciemat.es

CE=lcg02.ciemat.es:2119/jobmanager-lcgpbs-cms

HOSTNAME=wn002.jinr.ru

CE=lcgce02.jinr.ru:2119/jobmanager-lcgpbs-cms

3/7 22:32:15 (pid:6761) GCB: ERROR "handleActiveBlkedConn(8): setting status=CONN_FAILED, error 111 Connection refused"
3/7 22:32:15 (pid:6761) attempt to connect to <169.228.130.23:9629> failed: Connection refused (connect errno = 111).
3/7 22:32:15 (pid:6761) ERROR: SECMAN:2004:Failed to create security session to <169.228.130.23:9629> with TCP|SECMAN:2003:TCP connection to <169.228.130.23:9629> failed

3/7 22:32:15 (pid:6761) Failed to start non-blocking update to <169.228.130.23:9629>.
3/7 22:37:46 (pid:6761) GCB: ERROR "handleActiveBlkedConn(8): setting status=CONN_FAILED, error 111 Connection refused"
3/7 22:37:46 (pid:6761) attempt to connect to <169.228.130.23:9629> failed: Connection refused (connect errno = 111).
3/7 22:37:46 (pid:6761) ERROR: SECMAN:2004:Failed to create security session to <169.228.130.23:9629> with TCP|SECMAN:2003:TCP connection to <169.228.130.23:9629> failed

3/7 22:37:46 (pid:6761) Failed to start non-blocking update to <169.228.130.23:9629>.
3/7 22:43:15 (pid:6761) GCB: ERROR "handleActiveBlkedConn(8): setting status=CONN_FAILED, error 111 Connection refused"
3/7 22:43:15 (pid:6761) attempt to connect to <169.228.130.23:9629> failed: Connection refused (connect errno = 111).
3/7 22:43:15 (pid:6761) ERROR: SECMAN:2004:Failed to create security session to <169.228.130.23:9629> with TCP|SECMAN:2003:TCP connection to <169.228.130.23:9629> failed

3/7 22:43:15 (pid:6761) Failed to start non-blocking update to <169.228.130.23:9629>.
3/7 22:48:45 (pid:6761) GCB: ERROR "handleActiveBlkedConn(8): setting status=CONN_FAILED, error 111 Connection refused"
3/7 22:48:45 (pid:6761) attempt to connect to <169.228.130.23:9629> failed: Connection refused (connect errno = 111).
3/7 22:48:45 (pid:6761) ERROR: SECMAN:2004:Failed to create security session to <169.228.130.23:9629> with TCP|SECMAN:2003:TCP connection to <169.228.130.23:9629> failed

3/7 22:48:45 (pid:6761) Failed to start non-blocking update to <169.228.130.23:9629>.
3/7 22:54:11 (pid:6761) No resources have been claimed for 1200 seconds
3/7 22:54:11 (pid:6761) Shutting down Condor on this machine.
3/7 22:54:11 (pid:6761) Got SIGTERM. Performing graceful shutdown.

 

2. Users generating MC (with Datasetpath=None)

DESIRED_Gatekeepers is empty.

Line: 104 to 86
  File "/home/cms040/globus-tmp.cmsfarm-04-10.29601.0/glide_R29835/execute/dir_30920/writeCfg.py", line 234, in ?
exit_status = main(sys.argv[1:])
File "/home/cms040/globus-tmp.cmsfarm-04-10.29601.0/glide_R29835/execute/dir_30920/writeCfg.py", line 90, in main
maxEvents = int(os.environ.get('MaxEvents', '0'))
ValueError: invalid literal for int(): /store/mc/JobRobot/QCD_pt_0_15/GEN-SIM-RAW-RECO/IDEAL_V9_JobRobot/0000/A48D5963-E5A1-DD11-83B5-001560AC7E98.root
%MSG-s CMSException: PoolSource? :source{*ctor*} 04-Mar-2009 23:10:46 CET pre-events
cms::Exception caught in cmsRun
---- Configuration BEGIN
Error occured while creating source PoolSource?
---- Configuration BEGIN
MissingParameter: The required parameter 'fileNames' was not specified.
---- Configuration END
---- Configuration END
Changed:
<
<
SOLUTION: Problem identified and fixed (positional parameters were wrong)
>
>
SOLUTION: Problem identified and fixed (positional parameters were wrong in the crabserver)
 

6. Disable the condor to sendback the output files back to the server.

Line: 114 to 96
  and grab that one larger gzipped archive from the client. We shouldhave a discussion about pros and cons o this!
Changed:
<
<

7. gfactory that can work qwith multiple proxies

>
>

7. gfactory that can work with multiple proxies

  Igor's problem. We are ready to deploy a gfactory that works with multiple proxies any time we get one. An ideal configuration would be to have Stefano's jobRobot jobs all run with his proxy only, while rest of the user jobs can use the "service proxies".

8. CRAB status does not communicate the associated CE names back.

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback