List of Issues(Logbook)
2009/03/31
1. 2009-03-31 07:42:51,385:FatWorker worker_0 preparing submission
2009-03-31 07:42:51,386:FatWorker worker_0 performing list-match operation
2009-03-31 07:42:59,743:Sending
TTXmlLogging? .
2009-03-31 07:42:59,743:Registering information:
{'submittedJobs': None, 'SE-White': None, 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWork
er/FatWorker.py", line 147, in run\n sub_jobs, reqs_jobs, matched, unmatched = self.submissionListCreation(taskObj, newRange)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-se
rver/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 570, in submissionListCreation\n schedParam, sites = self.sched_parameter_Glidein(id_job, taskObj)\n File "/home/hpi/CRABSERVER_Deployment
/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 780, in sched_parameter_Glidein\n availCEs = listAllCEs(version, arch, onlyOSG=onlyOSG)\n File "/h
ome/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BDII/BdiiLdap.py", line 247, in listAllCEs\n ceList = filterCE(ceList, software, arch, b
dii, onlyOSG)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BDII/BdiiLdap.py", line 219, in filterCE\n ceList = getSoftware
AndArch(ceList, software, arch, bdii)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BDII/BdiiLdap.py", line 184, in getSoftwar
eAndArch\n query += buildOrQuery(\'GlueChunkKey=GlueClusterUniqueID\', [ce_to_cluster_map[h] for h in host_list])\nKeyError: \'gridce.sns.it:2119/jobmanager-lcgpbs-cms\'\n', 'skippedJobs': None, 'error': 'W
orkerError worker_0. Task spiga_crab_0_090331_164202_45iyu1. listMatch.', 'reason': 'Failure in pre-submission init', 'SE-Black': "['gridce.pg.infn.it']", 'unmatchedJobs': None, 'range': '[1, 2, 3, 4, 5, 6, 7,
8, 9, 10]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00125.grid.sinica
.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-31 07:42:59,744:WorkerError worker_0. Task spiga_crab_0_090331_164202_45iyu1. listMatch.
2009-03-31 07:42:59,744:'gridce.sns.it:2119/jobmanager-lcgpbs-cms'
2009-03-31 07:42:59,744:FatWorker worker_0 performing submission
2009-03-31 07:42:59,748:Sending
TTXmlLogging? .
2009-03-31 07:42:59,748:Registering information:
{'submittedJobs': None, 'SE-White': None, 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWork
er/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTaskBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_
gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 372, in submitTaskBlocks\n for sub in sub_jobs: fullSubJob.extend(sub)\nTypeError: iteration over non-sequence\n', 'skipp
edJobs': None, 'error': 'WorkerError worker_0. Task spiga_crab_0_090331_164202_45iyu1.', 'reason': 'Failure during jobs submission', 'SE-Black': "['gridce.pg.infn.it']", 'unmatchedJobs': None, 'range': '[1, 2,
3, 4, 5, 6, 7, 8, 9, 10]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00
125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-31 07:42:59,748:WorkerError worker_0. Task spiga_crab_0_090331_164202_45iyu1.
2009-03-31 07:42:59,748:iteration over non-sequence
2.
2009-03-31 11:32:00,441:Registering information:
{'submittedJobs': None, 'SE-White': "['grid-srm.physik.rwth-aachen.de']", 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTaskBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 397, in submitTaskBlocks\n task = self.blSchedSession.submit(task[\'id\'], sub_jobs[ii], reqs_jobs[ii])\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/API/BossLiteAPISched.py", line 129, in submit\n self.scheduler.submit( task, requirements )\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/prodcommon/PRODCOMMON_0_12_12_CRAB_1-cmp/lib/ProdCommon/BossLite/Scheduler/Scheduler.py", line 95, in submit\n job.runningJob[\'schedulerId\'] = jobAttributes[ job[\'name\'] ]\nKeyError: \'spadhi_crab_0_090331_202909_4p39kj_job114\'\n', 'skippedJobs': None, 'error': 'WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None, 'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-31 11:32:00,441:WorkerError worker_1. Task spadhi_crab_0_090331_202909_4p39kj.
2009/03/21
1. Desired_SE names to be part of the jdl
2.
SchedulerGrid? .py:
Issues for glideins and condor-g:
txt += ' echo "SyncCE=`glite-brokerinfo getCE`" >> $RUNTIME_AREA/$repo \n'
txt += 'if [ $middleware
= LCG ]; then\n'
txt +
'
CloseCEs? =`glite-brokerinfo getCE`\n'
glite specific dependencies needs to addressed.
3.
FatWorker? .py
2009-03-20 20:38:16,516:Registering information:
{'submittedJobs': None, 'SE-White': None, 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTaskBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 378, in submitTaskBlocks\n self.SendMLpre(task)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 615, in
SendMLpre? \n params = self.collect_MLInfo(task)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 647, in collect_MLInfo\n params = {\'tool\': \'crab\',\\\n File "/build/fvlingen/CMS_BUILD/comp-nightly-prodagent/w/slc4_ia32_gcc345/external/python/2.4.2-cmp4/lib/python2.4/UserDict.py", line 17, in __getitem__\n def __getitem__(self, key): return self.data[key]\nKeyError: \'HOSTNAME\'\n', 'skippedJobs': None, 'error': 'WorkerError worker_0. Task spadhi_crab_0_090321_043717_80rcv3.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None, 'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009-03-20 20:38:16,517:WorkerError worker_0. Task spadhi_crab_0_090321_043717_80rcv3.
2009-03-20 20:38:16,517:'HOSTNAME'
2.
2009-03-20 18:04:23,639:Registering information:
{'submittedJobs': None, 'SE-White': None, 'exc': 'Traceback (most recent call last):\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/
cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 159, in run\n submittedJobs, nonSubmittedJobs, errorTrace = self.submitTa
skBlocks(taskObj, sub_jobs, reqs_jobs, matched)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/
lib/CrabServerWorker/FatWorker.py", line 378, in submitTaskBlocks\n self.SendMLpre(task)\n File "/home/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_
gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 615, in
SendMLpre? \n params = self.collect_MLInfo(task)\n File "/h
ome/hpi/CRABSERVER_Deployment/MYTESTAREA/slc4_ia32_gcc345/cms/crab-server/CRABSERVER_1_0_7-cmp/lib/CrabServerWorker/FatWorker.py", line 647, in collect_ML
Info\n params = {\'tool\': \'crab\',\\\n File "/build/fvlingen/CMS_BUILD/comp-nightly-prodagent/w/slc4_ia32_gcc345/external/python/2.4.2-cmp4/lib/pyth
on2.4/UserDict.py", line 17, in __getitem__\n def __getitem__(self, key): return self.data[key]\nKeyError: \'HOSTNAME\'\n', 'skippedJobs': None, 'error
': 'WorkerError worker_0. Task spadhi_crab_0_090321_020302_3vfw41.', 'reason': 'Failure during jobs submission', 'SE-Black': None, 'unmatchedJobs': None,
'range': '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77
, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]
', 'CE-White': None, 'time': None, 'notSubmittedJobs': None, 'ev': 'Submission', 'CE-Black': "['fnal.gov', 'gridka.de', 'w-ce01.grid.sinica.edu.tw', 'w-ce
02.grid.sinica.edu.tw', 'lcg00125.grid.sinica.edu.tw', 'gridpp.rl.ac.uk', 'cclcgceli03.in2p3.fr', 'cclcgceli04.in2p3.fr', 'pic.es', 'cnaf']"}
2009/03/07
2. Users generating MC (with Datasetpath=None)
DESIRED_Gatekeepers is empty.
2009/03/05
1. GCB Fails
/home/cms001/globus-tmp.cmsfarm-08-16.20721.0/glide_S20865/condor/sbin/gcb_broker_query: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory
Sites are:
1. CE=ce.indiacms.res.in:2119/jobmanager-lcgpbs-cms
HOST=wn104.indiacms.res.in
2. CE=oberon.hep.kbfi.ee:2119/jobmanager-lcgpbs-long
HOSTNAME=wn-b-36
3. Following there are the same site: CE=t2-ce-01.lnl.infn.it:2119/jobmanager-lcglsf-cms
HOSTNAME=cmsfarm-12-04
CE=t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-cms
HOSTNAME=cmsfarm-08-10
CE=t2-ce-03.lnl.infn.it:2119/jobmanager-lcglsf-cms
HOSTNAME=cmsfarm-12-02
Igor's problem to fix (FIXED)
2. Condor exit code fails
Factory submitts more jobs
Igor's problem to talk to Condor to get it fixed (FIXED)
3. DESIRED_Gatekeeper string empty
If the osg_bdii cannot find the CE name
For Eric to fix. Will mock up behaviour of glite submission, in that job submission to crabserver will fail if CE name not found vi bdii query.
4. environment GLITE_WMS_RB_BROKERINFO or EDG_WL_RB_BROKERINFO for glideins
Error:
- environment GLITE_WMS_RB_BROKERINFO or EDG_WL_RB_BROKERINFO not defined
- ./BrokerInfo file is not found
Error:
- environment GLITE_WMS_RB_BROKERINFO or EDG_WL_RB_BROKERINFO not defined
- ./BrokerInfo file is not found
Eric's problem to disable it for condor_g and glidein.
5. Traceback (most recent call last):
File "/home/cms040/globus-tmp.cmsfarm-04-10.29601.0/glide_R29835/execute/dir_30920/writeCfg.py", line 234, in ?
exit_status = main(sys.argv[1:])
File "/home/cms040/globus-tmp.cmsfarm-04-10.29601.0/glide_R29835/execute/dir_30920/writeCfg.py", line 90, in main
maxEvents = int(os.environ.get('MaxEvents', '0'))
ValueError: invalid literal for int(): /store/mc/JobRobot/QCD_pt_0_15/GEN-SIM-RAW-RECO/IDEAL_V9_JobRobot/0000/A48D5963-E5A1-DD11-83B5-001560AC7E98.root
%MSG-s CMSException:
PoolSource? :source{*ctor*} 04-Mar-2009 23:10:46 CET pre-events
cms::Exception caught in cmsRun
---- Configuration BEGIN
Error occured while creating source
PoolSource? ---- Configuration BEGIN
MissingParameter: The required parameter 'fileNames' was not specified.
---- Configuration END
---- Configuration END
SOLUTION: Problem identified and fixed (positional parameters were wrong in the crabserver)
6. Disable the condor to sendback the output files back to the server.
This needs some thought !!! The problem is that anybody using crabserver at reasonable scale ends up having too many files to get back, erach and every one of
which is a separate gridftp connection to crabserver host. This is a royal pain in the neck for the user. One way to fix it is to gzip the tgz's at the server into one,
and grab that one larger gzipped archive from the client. We shouldhave a discussion about pros and cons o this!
7. gfactory that can work with multiple proxies
Igor's problem. We are ready to deploy a gfactory that works with multiple proxies any time we get one. An ideal configuration would be to have Stefano's jobRobot jobs all run with his proxy only, while rest of the user jobs can use the "service proxies".
8. CRAB status does not communicate the associated CE names back.
When you do a
crab -status
via the client, you do not get the CE name where your job is running. This is not a big deal. Just listed here for completeness.
9. All sorts of dashboard related issues
Sanjay's problems
10. Crab client looks for condor daemons/commands by default
Need to disable this check, it in order to use it at lxplus. We do not have condor installed at CERN.
Eric's problem
--
SanjayPadhi - 2009/03/04