Difference: TaskList (113 vs. 114)

Revision 1142011/06/16 - Main.ParagMhashilkar

Line: 1 to 1
 
META TOPICPARENT name="GlideinWMS"

Task List

Line: 283 to 283
 
  • Deleted:
    <
    <

    Accounting error in Factory/Frontend (Parag)

    • This happens when two or more entries share same site name
    • Update the User Running here in rrd
  •  

    Factory reusing the old keypair after restart (Parag)

    Date: Fri, 08 Oct 2010 17:51:28 -0700   Hi all.   The current gfactory creates a new public/private key pair at each restart (which includes reconfigs).  While this is good, as it keeps it fresh, it has a nasty side effect; any existing requests from frontends  are ignored (since they use the old key).  I propose we keep the old key around for at least 10 or 20 cycles,  and accept frontend adds with either old or new key. (After 20 or so cycles, we should throw away the key  and ignore any old frontend adds... they are obviously stale)   What do you think?   Igor 
  • Line: 300 to 295
     
  • Changed:
    <
    <

    Expand the condor tarball to include condor_kflops if it exists (Parag)

    >
    >

    Expand the condor tarball to include condor_kflops if it exists (Parag)

     
    Hi all.
    
    Line: 315 to 310
     

  • Changed:
    <
    <

    BUG: Improper termination of glidein causes condor_started=false in monitoring (Doug)

    >
    >

    BUG: Improper termination of glidein causes condor_started=false in monitoring (Doug)

     
    • SIGHUP/preemption at Purdue

  • Changed:
    <
    <

    Limit the max number of glideins per frontend (Doug)

    >
    >

    Limit the max number of glideins per frontend (Doug)

     
    • This is just specific to the frontend.

  • Changed:
    <
    <

    Use DAEMON_SHUTDOWN to shutdown glidein daemons (Doug)

    >
    >

    Use DAEMON_SHUTDOWN to shutdown glidein daemons (Doug)

     
    • Only supported in condor 7.4+
       Relevant info from Dan's email shutdown fast - disregard MaxJobRetirementTime and hard-kill jobs immediately shutdown graceful - respect MaxJobRetirementTime and when that expires, soft-kill jobs; if SHUTDOWN_GRACEFUL_TIMEOUT time passes, then stop respecting MaxJobRetirementTime and elevate to a fast shutdown shutdown peaceful - same as graceful, but MaxJobRetirementTime=infinity And recall that MaxJobRetirementTime is counted from when the job began running, not from when the eviction happened. So your policy of MaxJobRetirementTime=30 means any job that has already run for more than 30 seconds will be evicted immediately when entering graceful shutdown mode. I agree that what glideinWMS probably wants peaceful shutdown, not graceful shutdown. As Igor suggested, this can be achieved by using DAEMON_SHUTDOWN. I think one would need to adjust the START expression to stop accepting new jobs after some amount of time and then adjust the STARTD.DAEMON_SHUTDOWN expression to shut down the startd once the jobs go away. The MASTER.DAEMON_SHUTDOWN expression can be set to shut down the master when the startd goes away. 

  • Changed:
    <
    <

    Allow factory to specify if an entry point (CE) requires voms proxies only for pilot and user jobs (Doug)

    >
    >

    Allow factory to specify if an entry point (CE) requires voms proxies only for pilot and user jobs (Doug)

      7/1/10 from Igor Sfiligoi - Some sites (entry point) allow only jobs with voms proxies authorized access to their resources. The current glexec-enable glidein currently requires that user jobs have just grid proxies. This needs to be expand to allow the factories to specify, additionally, if voms proxies are required on user jobs for an entry point and apply that criteria in the glidein job selection process.
  • Changed:
    <
    <

    BUG: Daylight Saving possibly messing up the factory accounting (Doug)

    >
    >

    BUG: Daylight Saving possibly messing up the factory accounting (Doug)

     
    Check 2 Igor's Emails sent to glideinwms@fnal.gov Sun, 07 Nov 2010 09:38:12 -0800  

  • Changed:
    <
    <

    BUG: Factory reports glideins as completed several times, even after a long time (Doug)

    >
    >

    BUG: Factory reports glideins as completed several times, even after a long time (Doug)

     
    • Happens when the glidein proxies are refreshed (?)
    
    
    Line: 530 to 527
     
  • Added:
    >
    >

    Accounting error in Factory/Frontend (Parag)

    • This happens when two or more entries share same site name
    • Update the User Running here in rrd
    • This bug was rejected in favor of smart monitoring(?)

  •  

    Provide the ability to specify the RSL on a VO-by-VO basis

    6/14/10 from Igor Sfiligoi - Several sites (i.e. most non-condor sites) require a different RSL for each VO submitting to them. Having a complete new entry for each VO for each site is annoying. The site is functionally identical from the gfactory point of view. It would be nice to have an option to massage the RSL on a VO-by-VO basis (i.e. frontend-by-frontend basis). Possibly not the whole RSL but just the relevant part.
     
    This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Send feedback