Difference: FkwSTEP09CRABserverIssues (3 vs. 4)

Revision 42009/06/21 - Main.FkW

Line: 1 to 1
 
META TOPICPARENT name="FkwSTEP09CRABserver"
Line: 157 to 157
  This leads to whole tasks disappearing in the crab server, without the user being informed about it.
Changed:
<
<

Problem with the way proxies are updated

>
>

Problems due to unsecured multi-threading

 
Changed:
<
<
It appears that "something" in the crabserver keeps the user proxy updated from myproxyserver. Whatever mechanism does this, does it in such a way to cause problems with glexec, and condor. Basically, the proxy on disk is touched and maybe rewritten (?). This is not done as an atomic (i.e. fast) process. As a result, both condor and glexec sometimes find a corrupted/incomplete/inconsistent proxy on disk while they try to access it. This leads to both of them failing.
>
>
The following is fkw's probably incomplete understanding of what we know today 6/21/09
 
Changed:
<
<
The way to do this better would be to write the new proxy into a seperate file, and then mv the file to its proper place.

As of right now, we do not know what piece of software inside crabserver does this.

Our knowledge of this going on comes from:

  • condor core dump analysis
  • glexec error "no file is found"
>
>
The way the crabserver does the condor job submission poses severe problems because (at least) two files are overwritten by multiple crabserver worker threads. This happens because more than one worker submits simultaneously jobs from the same task via glexec to condor. The two files are:
  • the actual proxy file
  • the script glexecWrapper.sh
These two files are moved from the uid context of hpi (the username the crabserver runs in) to the uid space of the user who's job it is. If this is done by multiple threads at once, it leads to problems as one thread is attempting to use the file it has just written while the second thread is overwriting it.

Proxy renewal

There is a general problem that the uid that runs the crabserver is not the uid of the user. The submission to condor is done via glexec so that condor_submit is done from within the user uid space. This than posses a problem with proxy renewal via the myproxy mechanism. Need to ask Sanjay what exactly he has done about this. He was going to talk with the gLite folks to better understand how they deal with the same issue.

 

Where to find what logs on crabserver

Here we document the directories where you find stuff on the glidein-2 crab server.
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback