Difference: UCSDUserDocPCF (17 vs. 18)

Revision 182017/01/13 - Main.MartinKandes

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Line: 112 to 112
 
+sdsc FALSE Comet Supercomputer Open only to PCF users with an XSEDE allocation on Comet
+uc FALSE Open Science Grid Open to all PCF users
Changed:
<
<
As such, we see here that the sample submit description file is only targeted to run the job locally on PCF itself.
>
>
We see here that the sample submit description file is only targeted to run the job locally on PCF itself.
  Finally, the sample submit description file ends with the queue command, which as shown here simply places an integer number of copies (10) of the job in the HTCondor queue upon submission. If no integer value is given with the queue command, the default value is 1. Every submit description file must contain at least one queue command.
Line: 170 to 170
  10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended
Changed:
<
<
The status of each submitted job in the queue is provided in the column labeled ST in the standard output of the condor_q command. In general, you will only find 3 different status codes in this column, namely:
>
>
The status of each submitted job in the queue is provided in the column labeled ST in the standard output of the condor_q command. In general, you will only find 3 different job status codes in this column, namely:
 
  • R: The job is currently running.
  • I: The job is idle. It is not running right now, because it is waiting for a machine to become available.
Line: 198 to 198
  1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 1 held, 0 suspended
Changed:
<
<
In this case, for some reason you purposely placed the job on hold using the condor_hold command. However, if you find a more unusual HOLD_REASON given and you are unable to resolve the issue yourself, please contact the PCF system administrators to help you investigate the problem.
>
>
In this case, for some reason you placed the job on hold using the condor_hold command. However, if you find a more unusual HOLD_REASON given and are unable to resolve the issue yourself, please contact the PCF system administrators to help you investigate the problem.
 
Changed:
<
<
If you find that your job has been sitting idle (I) for an unusually long period of time, you can run condor_q with the -analyze (or -better-analyze) option to attempt to diagnose the problem.
>
>
If instead you find that your job has been sitting idle (I) for an unusually long period of time, you can run condor_q with the -analyze (or -better-analyze) option to attempt to diagnose the problem.
 
 [youradusername@pcf-osg ~]$ condor_q -analyze 16250.0

Line: 242 to 241
 
---------- local change to undefined
Changed:
<
<
mkandes@pcf-osg ~$ condor_q 16662.4 -l | less

MATCH_EXP_JOB_GLIDEIN_Entry_Name = "Unknown" MATCH_EXP_JOB_GLIDEIN_Schedd = "Unknown" MaxHosts? = 1 MATCH_EXP_JOBGLIDEIN_ResourceName = "UCSD" User = "mkandes@pcf-osg.t2.ucsd.edu" EncryptExecuteDirectory? = false MATCH_GLIDEIN_ClusterId = "Unknown" OnExitHold? = false CoreSize? = 0 JOB_GLIDEIN_SiteWMS = "$$(GLIDEIN_SiteWMS:Unknown)" MATCH_GLIDEIN_Factory = "Unknown" MachineAttrCpus0? = 1 WantRemoteSyscalls? = false MyType? = "Job" Rank = 0.0 CumulativeSuspensionTime? = 0 MinHosts? = 1 MATCH_EXP_JOB_GLIDEIN_SiteWMS_Slot = "Unknown" PeriodicHold? = false PeriodicRemove? = false Err = "pi.err.16662.4" ProcId? = 4

-analyze

>
>
Again, if you are unable to resolve the issue yourself, please contact the PCF system administrators to help you investigate the problem.
 

Job Removal

Changed:
<
<
[1514] mkandes@pcf-osg ~$ condor_rm 16662.4 Job 16662.4 marked for removal
>
>
Occasionally, you you may need remove a job that has already been submitted to the PCF queue. For example, maybe the job has been misconfigured in some way or goes held for some reason. To remove a job in the queue, you can use the condor_rm command. To remove a job from the queue, provide the both the ClusterId and ProcId of the job you would like to remove.
 
Added:
>
>
 [youradusername@pcf-osg ~]$ condor_q youradusername

 
Changed:
<
<
condor_q 16662
>
>
-- Schedd: pcf-osg.t2.ucsd.edu : <169.228.130.75:9615?... ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 16665.0 youradusername 1/13 08:55 0+01:24:38 R 0 122.1 bash_pi.sh -b 8 -r 16665.1 youradusername 1/13 08:55 0+01:24:38 R 0 26.9 bash_pi.sh -b 8 -r 16665.2 youradusername 1/13 08:55 0+01:24:38 R 0 26.9 bash_pi.sh -b 8 -r 16665.3 youradusername 1/13 08:55 0+01:24:38 R 0 26.9 bash_pi.sh -b 8 -r 16665.4 youradusername 1/13 08:55 0+01:24:38 R 0 26.9 bash_pi.sh -b 8 -r 16665.5 youradusername 1/13 08:55 0+01:24:38 R 0 26.9 bash_pi.sh -b 8 -r 16665.6 youradusername 1/13 08:55 0+01:24:37 R 0 26.9 bash_pi.sh -b 8 -r 16665.7 youradusername 1/13 08:55 0+01:24:37 R 0 26.9 bash_pi.sh -b 8 -r 16665.8 youradusername 1/13 08:55 0+01:24:37 R 0 26.9 bash_pi.sh -b 8 -r 16665.9 youradusername 1/13 08:55 0+01:24:37 R 0 26.9 bash_pi.sh -b 8 -r

10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended

[youradusername@pcf-osg ~]$ condor_rm 16665.0 16665.2 16665.4 16665.6 16665.8

Job 16665.0 marked for removal Job 16665.2 marked for removal Job 16665.4 marked for removal Job 16665.6 marked for removal Job 16665.8 marked for removal

 
Added:
>
>
[youradusername@pcf-osg ~]$ condor_q youradusername
  -- Schedd: pcf-osg.t2.ucsd.edu : <169.228.130.75:9615?... ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
Changed:
<
<
16662.0 mkandes 1/12 14:51 0+00:23:04 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.1 mkandes 1/12 14:51 0+00:23:04 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.2 mkandes 1/12 14:51 0+00:23:04 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.3 mkandes 1/12 14:51 0+00:23:03 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.5 mkandes 1/12 14:51 0+00:23:03 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.6 mkandes 1/12 14:51 0+00:23:03 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.7 mkandes 1/12 14:51 0+00:23:03 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.8 mkandes 1/12 14:51 0+00:23:02 R 0 26.9 pi.sh -b 8 -r 7 -s 16662.9 mkandes 1/12 14:51 0+00:23:02 R 0 26.9 pi.sh -b 8 -r 7 -s
>
>
16665.1 youradusername 1/13 08:55 0+01:26:04 R 0 26.9 bash_pi.sh -b 8 -r 16665.3 youradusername 1/13 08:55 0+01:26:04 R 0 26.9 bash_pi.sh -b 8 -r 16665.5 youradusername 1/13 08:55 0+01:26:04 R 0 26.9 bash_pi.sh -b 8 -r 16665.7 youradusername 1/13 08:55 0+01:26:03 R 0 26.9 bash_pi.sh -b 8 -r 16665.9 youradusername 1/13 08:55 0+01:26:03 R 0 26.9 bash_pi.sh -b 8 -r
 
Changed:
<
<
9 jobs; 0 completed, 0 removed, 0 idle, 9 running, 0 held, 0 suspended
>
>
5 jobs; 0 completed, 0 removed, 0 idle, 5 running, 0 held, 0 suspended
 
Added:
>
>
However, if you need to remove a whole cluster of jobs, then just use the ClusterId of the jobs.
 

Job History

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback