Debugging Overflow Read-Access errors from a crabserver perspective

Logical Procedure

  • login as root into the relevant crabserver
  • figure out via condor history commands which jobs failed with Exit code 84
    • find the username of the people who those jobs belong to
    • use the UCSD GUMS mapping to figure out the DNs
  • cd to the crabserver spool directory
  • use ls and grep to figure out which directory belongs to the user you are looking for.
    • the first few characters are the hn name of the person in the directory name. So guessing shouldn't be too hard.
  • inside the correct directory you can find arguments.xml, fjr, and the stdout/err inside the appropriate outfile_XXX.tgz file.

Details you need for doing this

  • crabservers are:
    • glidein-2 with /var/gftp_cache/CSstoragePath as spool directory
    • submit-2 with /crabprod/CSstoragePath as spool directory
  • to find the jobs that failed with read access error "grep 8020 *fjr*"
    • to see what file the job failed to open do a less on the corresponding fjr. It will say so near the top.
  • to verify yourself that this file can't be opened, try using xrootd with this users proxy. The proxy is in the directory as well.
    • ask somebody else how to setp up root such that you can use xrootd client on the crabserver while logged in as root.
      • /code/osgcode is probably mounted, and you can find the CMSSW release directory from there if you know what you are doing, maybe.

-- FkW - 2012/01/11

Topic revision: r1 - 2012/01/11 - 03:36:07 - FkW
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback