Difference: WorkForAidan (1 vs. 2)

Revision 22015/08/28 - Main.FkW

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Big Picture

Line: 45 to 45
 Second, not all jobs read via XRootd. I.e. you need to go through the XRootd info, find jobs that are interesting, then find those same jobs in the HTCondor classAd records.
Changed:
<
<
The Xrootd info to tag onto is app info and loosk something like this:
>
>
The Xrootd info to tag onto is app info and looks something like this:
  70_https://glidein.cern.ch/70/150823:160611:aidan:crab:20150823:RunIISpring15DR74:WZ:25ns:v1_0
Changed:
<
<
We're right now not sure how that relates to information in the classAd.
>
>
This is made up from the following pieces:
  • Task 150823_160611_aidan_crab_20150823_RunIISpring15DR74_WZ_25ns_v1
  • Job ID 70
  • Retry 0

The place in gitbug where this seems to be define is: https://github.com/dmwm/CRABServer/blob/master/scripts/CMSRunAnalysis.py#L97

params['MonitorJobID'] = '%d_https://glidein.cern.ch/%d/%s_%d' % (myad['CRAB_Id'], myad['CRAB_Id'], myad['CRAB_ReqName'].replace("_", ":"), myad['CRAB_Retry’])

 

-- FkW - 2015/08/27

Revision 12015/08/27 - Main.FkW

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Big Picture

Understand the performance characteristics of jobs running on the global grid infrastructure by correlating information from HTCondor with information from XRootd.

Initial Questions to answer

  • Is the distribution of CPU/walltime different for jobs that read remotely via XRootd then for jobs that read local at the site that they consume storage at?
  • If there is a significant difference, how does this difference compare with the difference among local reads for different sites?
  • How does it compare with the difference in local reads for different tasks?
  • How does it compare with the difference within a task?
  • How does it compare with the difference for different tasks by different people?

Tools you need to answer the initial questions

Analyzing the HTCondor ClassAd?

Each job in HTCondor has an end-of-job classAd. We've put a file with a few such records here. Each classAd is about 250 lines or so. Not all have the same length.

For a given classAd, there are a few fields of particular relevance for this purpose:

  • RemoteUserCPU?
  • RemoteSysCPU?
  • RemoteWallClockTime?
    • CPUefficiency is defined as (RemoteUserCPU? +RemoteSysCPU)/RemoteWallClockTime
  • DAGNodeName?
    • this identifies the job number within a task. This is a unique number within the task.
  • CRAB_ReqName
    • this identifies uniquely the task. I.e. all jobs from the same task will have this set the same way.
  • MATCH_GLIDEIN_CMSSite
    • this uniquely identifies the site. All jobs that ran at the same site will have the same string for this parameter.
  • Crab_UserDN
    • this uniquely identifies the user. I.e. different people will have different strings. And all jobs from the same person will have the same string.

I think this is all you need to know about classAds to answer the initial questions. Feel free to read a few classAds carefully, and see if there are other things in them that look interesting to track.

Analyzing the detailed monitoring info from XRootd

First of all, not all Xrootd records have information about what job they refer to. So ignore all that don't.

Second, not all jobs read via XRootd. I.e. you need to go through the XRootd info, find jobs that are interesting, then find those same jobs in the HTCondor classAd records.

The Xrootd info to tag onto is app info and loosk something like this:

70_https://glidein.cern.ch/70/150823:160611:aidan:crab:20150823:RunIISpring15DR74:WZ:25ns:v1_0

We're right now not sure how that relates to information in the classAd.

-- FkW - 2015/08/27

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback