Release v2.3 series

Release v2.3

  • Fix key handling security bug (IS)

    • BREAKS BACKWARDS COMPATIBILITY

Release v2.3.1

  • Document VO Front End (PM, KL)

Release v2.3.2

  • Security Bug Fix Release

FEATURES complete for v2.3 (and previous releases)


  • Better Troubleshooting Guide

    • Guide should be aimed at users who do not have much experience with the system
    • Walk through the steps that take place from a user job submission till it starts running.
    • What/Where to look at every step. More like 1)User submits jobs 2) Look in the queue if job shows up 3) Check if corresponding glidein job shows up in one of the glidein queues, .... so on. Write now the users are lost and clueless
    • It may not be possible to list all possible things that can go wrong. This is fine as far as the guide informs the user at what stage he/she is in and where to look for error messages at that stage.
    • Possible errors can be listed as examples and filled in as we get more feedback.
  • Change the glideinWMS protocol to prevent replay attacks

    • Implemented since snapshot_091026
    • Added a new field, called ReqEncIdentity? , that must match the AuthenticatedIdentity? .
    • Requires a change in the frontend config
  • The frontend should only send proxies to trusted factories; add list of trusted factory identities

    • Implemented since snapshot_091029.
    • This required changes to the frontend config format.
  • Add support for multiple Condor versions within the same factory

    • Implemented since snapshot_091019
    • Requires change in factory config
  • Disable VOMS checking newer versions of condor

    • We only need VOMS checking enabled for sche. For other daemons there is unwanted overhead which we can get rid off (Fixed in head, Nov 5, 2009)
  • Set ALLOW_WRITE to support condor 7.4+

    • Set ALLOW_WRITE to $(HOSTALLOW_WRITE) in condor_config files created to maintain backward compatibility.

BUGS

  • If the proxy used by glidein is valid for less than 12 hours glidein job does not do log the error messages in a good way

    • v2.2 has good logging. Also, if you need the glidein to exit faster use verbosity config parameter in the factory configuration. By default verbosity is set to std, so glidein will sleep 20 min before exiting to avoid blackhole effect. Set it to fast or nodebug to shorten the sleep time.
    • Happens when glidein proxy is less than 12 hours. The error is not logged in a good way and it is quite easy to miss it.
    • Shouldn't the glidein job die gracefully? Only way to figure out this problem is by looking at stderr in the job area of .globus on the worker node.
    WARNING: Unable to verify signature! Server certificate possibly not installed. Error: Cannot verify AC signature! Proxy not valid in 12 hours!

  • Installer setting SEC_DEFAULT_SESSION_DURATION

    • That creates a huge memory blowout for Condor Daemons
    • The proper variable to set is SEC_DAEMON_SESSION_DURATION (In head since Jan 26th)
  • Factory shuts down if one of the entry crashes.(Head)

  • v2_1 doesn't work with glexec enabled. (Fixed Nov 26)

    • Tested with condor 7.3.1 and condor 7.2.4condor_procd hangs with permissions error.
    • Fixed since snapshot_091026. (Was due to GLEXEC_BIN having glidein_publish=True)
    • On Oct 30th made GLEXEC_JOB=True the default in CVS.
  • Pseudo interactive monitoring might not work in v2_2 with glexec enabled.

    • Fix with GLEXEC_BIN setting glidein_publish=False could be the reason. This needs to be confirmed.(Verified: It works)
  • Installer crashes if we accept default collector port. Doesn't handle empty string.(Fixed: Head Nov 05, 2009)

  • VDT client install

  • caches wrong (Fixed in head, Nov 4, 2009)

    -- JohnWeigand - 2009/11/04 -

    In the installation of the VDT client, the install script is using the wrong caches for the current release (and actually mixing OSG and VDT releases in the minimal install.)

    The OSG 1.2 cache is http://software.grid.iu.edu/osg-1.2 which maps to VDT cache: http://vdt.cs.wisc.edu/vdt_200_cache

    The code is using the OSG 1.0 cache: OSG ( http://software.grid.iu.edu)
    and VDT cache: http://vdt.cs.wisc.edu/vdt_200_cache
    The OSG cache actually maps/points to vdt_1101_cache... you can see that here.. http://software.grid.iu.edu/pacman/client.pacman

    The goc for whatever reason did not create an alias for the current release. The OSG aliases to OSG 1.0 ( http://software.grid.iu.edu)
    You have to totally qualify the URL for OSG 1.2 (http://software.grid.iu.edu/osg-1.2)

    • Set the VDT cache in V2.3+ explicitly. (Fixed in head, Nov 4, 2009)
  • full install never removes Condor (Resolved/now understood Dec 1, 2009)

    -- JohnWeigand - 2009/11/04 -

    When I run the same commands from the command line, I get...
    Error in package [Condor]:
    Package [/home/glidein/vdt-full: http://vdt.cs.wisc.edu/vdt_200_cache:Condor ] is required by package [VDT-Client]. Can't remove.
    You never see it in the script because stdout/err is /dev/null. I have a full condor (341 Mb) in my VDT area after the full install.

    Resolution: -- JohnWeigand - 2009/12/01 - The actual purpose behind doing the removal of Condor in a full install is just to remove Condor from the PATH variable when the VDT_LOCATION/setup.sh script is run. This insures that this version of Condor does not interfere with the Condor installed for the other glideinWMS services. This is why the the script does not care about the error mentioned above.

    Full client install VDT version 2.0.0.p11 Partial client install VDT version 2.0.0.p10
    990Mb (Condor - 343Mb) 155Mb
    Bandwidth Test Controller 1.3
    vdt-ca-manage 1.0
    vdt-update-certs 2.5
    CA Certificates 1.10 (includes IGTF 1.32 CAs)
    CGSI-gSOAP 1.2.1.2
    Condor/Condor-G 7.2.4 (removed from PATH)
    cURL 7.18.2
    Fetch CRL 2.6.6
    Grid File Access Library (GFAL) 1.11.9-1
    Globus Toolkit, pre web-services, client 4.0.8
    Globus Toolkit, web-services, client 4.0.8
    GPT 3.2-4.0.8p1
    GSI-Enabled OpenSSH? 4.6
    Java 5 SDK 1.5.0_21
    lcg-info 1.11.4-1
    lcg-infosites 2.6-2
    LCG Utils 1.7.6-1
    LCG File Catalog Client 1.7.2-4
    Logrotate 3.7
    MyProxy Client 4.7
    NDT 3.5.0
    Network Path and Application Diagnosis Client 1.5.5
    One-Way Active Measurement Protocol (One-Way Ping) 3.1
    Pegasus 2.3.0
    PPDG Cert Scripts 2.7
    pyGlobus gt4.0.1-1.13
    PyGlobus URL Copy 1.1.2.11
    SRM Fermi Client 1.9.2-4
    SRM Berkeley Client 2.2.1.2.i7.p3
    UberFTP 2.4
    VOMS Client 1.8.8-2p1
    Wget 1.11.4

    vdt-ca-manage 1.0
    vdt-update-certs 2.5
    CA Certificates 1.10 (includes IGTF 1.32 CAs)
    Fetch CRL 2.6.6
    Globus Toolkit, pre web-services, client 4.0.8
    GPT 3.2-4.0.8p1
    Logrotate 3.7
    MyProxy Client 4.7
    PPDG Cert Scripts 2.7
    VOMS Client 1.8.8-2p1
    Wget 1.11.4

-- ParagMhashilkar - 2010/09/08


This topic: UCSDTier2 > WebHome > GlideinWMS > TaskList > ReleasePagev2_3
Topic revision: r4 - 2011/05/24 - 18:16:00 - ParagMhashilkar
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback