Banning Users

During the CMS Security Challenge, glideinWMS CRAB SERVER operators may be asked to ban a particular DN and provide certain information about the "attack". In particular, given a particular user DN, admins may be asked to take action to:

Initial Actions

  • hold any running jobs
  • hold any queued jobs
  • block the user from further submissions

Detailed Procedures

The procedure for banning a user starts with mapping the certificate DN to a local userid on the UCSD CRAB Servers. This can be done by looking at the list of mappings. The command condor_q can also give you the same information, but only if the user still has jobs pending or running. The local UNIX userid typically has the form uscmsxxx.

condor_q -format '%s ' Owner -format '%s\n' x509userproxysubject | sort | uniq -c

On each of the submitter nodes (, submit-[1-4], HOLD any pending or running jobs from this user by running:

condor_hold uscmsxxx

Remove the local userid from the /etc/passwd file on all submitter nodes. This will block any further submissions.

Collecting Information

Information an operator should collect:

  • find out which sites jobs ran on
  • incoming IP address from which jobs were submitted (Do we have this information in the CRAB SERVER, and if so, where?)

Detailed Procedures

Information on which sites jobs ran at is (or soon will be) in the condor logs on the submission nodes in the file /opt/glidecondor/condor_local/log/EventLog. Eventually we will have a tool to parse this log, but the information is available in terms of condor Cluster ID (the same information you get from condor_q or condor_history) and JOB_Site:

letts@submit-4 /opt/glidecondor/condor_local/log$ tail -100 /opt/glidecondor/condor_local/log/EventLog |egrep '^Cluster|^JOB_Site'
Cluster = 31675
JOB_Site = "CERN"
Cluster = 31675
JOB_Site = "CERN"
Cluster = 31607
JOB_Site = "JINR"

IP address from which jobs were submitted? - STEFANO

Other actions based on information collected:

  • Notify sites where jobs ran. Note that individual jobs could have run on more than one site!
  • Report the results to CMS Security Contacts (Ian and Mine)

Compromised Pilot Certificate

The compromise of a pilot certificate is much more complicated than the case of a compromised user certificate, since there are only O(10) pilot certificates which are cycled round-robin to run glideinWMS pilots. User jobs will then connect to startd's run by these pilots for executing the user jobs. If a pilot certificate is compromised, then potentially every site and every user of glideinWMS for CMS analysis during the time since the compromise can be affected.

How do you know a pilot proxy was compromised? GOOD QUESTION!

Initial Actions

If a glideinWMS pilot DN is compromised, admins will have to:

  • Remove the particular pilot proxy from the rotation in the glideinWMS frontend and replace it with another of the 50 we have available.
  • Kill any running pilots with the banned proxy
  • Contact Factory Ops to kill any queued pilots

Detailed Procedures

There are two frontend instances running on under user frontend, instance_v5_4 for general usage and instance_o5_4 for xrootd overflow. These procedures could apply to either frontend.

Pilot certificates are removed from the configuration file frontend.xml in the CMS frontend, in ~/frontstage/instance_[ov]5_4.cfg in the section under security. For example, there is a list of pilot certificates used:

               <proxy absfname="/home/frontend/.globus/x509_pilot05_cms_prio.proxy" security_class="cmsprio"/>
               <proxy absfname="/home/frontend/.globus/x509_pilot06_cms_prio.proxy" security_class="cmsprio"/>
               <proxy absfname="/home/frontend/.globus/x509_pilot07_cms_prio.proxy" security_class="cmsprio"/>
               <proxy absfname="/home/frontend/.globus/x509_pilot08_cms_prio.proxy" security_class="cmsprio"/>
               <proxy absfname="/home/frontend/.globus/x509_pilot09_cms_prio.proxy" security_class="cmsprio"/>
               <proxy absfname="/home/frontend/.globus/x509_pilot10_cms_prio.proxy" security_class="cmsprio"/>

Remove the compromised proxy from the list and replace it with another that is not being used already in this frontend or in any other running frontend on the machine. Other certificates can be found in ~/.globus.

Reconfigure the frontend:

./frontend_startup reconfig ../instance_v5_4.cfg/frontend.xml

how to kill all pilots with DN=X

Collecting Information

  • find out which sites pilot jobs ran on using this proxy and notify them
  • find out which users had jobs which ran on pilots with a compromised proxy

Detailed Procedures

Given the large number of pilots running at any given time O(10000) and the small number of proxies O(10), every site and every user who ran a job in the glideinWMS analysis system since the time of the compromise of a pilot certificate will have been affected. To make this point, look at every site where pilots are currently running using one certificate:

letts@submit-4 ~$ condor_status -const '(GLIDEIN_X509_GRIDMAP_DNS=?="/DC=org/DC=doegrids/OU=Services/,/DC=org/DC=doegrids/OU=Services/,/DC=org/DC=doegrids/OU=Services/CN=uscmspilot05/")' -l | grep ^GLIDEIN_CMSSite | sort | uniq -c
     20 GLIDEIN_CMSSite = "T1_CH_CERN"
      3 GLIDEIN_CMSSite = "T1_US_FNAL"
      5 GLIDEIN_CMSSite = "T2_BE_IIHE"
     36 GLIDEIN_CMSSite = "T2_BE_UCL"
      5 GLIDEIN_CMSSite = "T2_BR_SPRACE"
      4 GLIDEIN_CMSSite = "T2_BR_UERJ"
     11 GLIDEIN_CMSSite = "T2_CH_CERN"
      2 GLIDEIN_CMSSite = "T2_CH_CSCS"
     39 GLIDEIN_CMSSite = "T2_DE_DESY"
      6 GLIDEIN_CMSSite = "T2_DE_RWTH"
      3 GLIDEIN_CMSSite = "T2_ES_IFCA"
     30 GLIDEIN_CMSSite = "T2_FR_GRIF_LLR"
      4 GLIDEIN_CMSSite = "T2_HU_Budapest"
     37 GLIDEIN_CMSSite = "T2_IT_Bari"
     67 GLIDEIN_CMSSite = "T2_IT_Legnaro"
      9 GLIDEIN_CMSSite = "T2_IT_Pisa"
     18 GLIDEIN_CMSSite = "T2_RU_JINR"
      2 GLIDEIN_CMSSite = "T2_UA_KIPT"
      8 GLIDEIN_CMSSite = "T2_UK_London_Brunel"
      7 GLIDEIN_CMSSite = "T2_UK_London_IC"
     18 GLIDEIN_CMSSite = "T2_UK_SGrid_RALPP"
      1 GLIDEIN_CMSSite = "T2_US_Caltech"
    130 GLIDEIN_CMSSite = "T2_US_Florida"
     13 GLIDEIN_CMSSite = "T2_US_MIT"
     26 GLIDEIN_CMSSite = "T2_US_Nebraska"
      3 GLIDEIN_CMSSite = "T2_US_Purdue"
     41 GLIDEIN_CMSSite = "T2_US_UCSD"
     26 GLIDEIN_CMSSite = "T2_US_Wisconsin"
     64 GLIDEIN_CMSSite = "T3_US_Colorado"
     50 GLIDEIN_CMSSite = "T3_US_Omaha"
     10 GLIDEIN_CMSSite = "T3_US_OSU"
      1 GLIDEIN_CMSSite = "T3_US_TTU"
      7 GLIDEIN_CMSSite = "T3_US_UMD"
This is 33 out of 39 sites running glideins at this time.


Other Actions

  • Notify the sites and users whose jobs ran with pilots with a compromised credential
  • Report the results to CMS Security Contacts (Ian and Mine)

Action Items

  1. Document to be REVIEWED BY IGOR - DONE
  2. Does CRAB log IP addresses where submissions come from? (Stefano)
  3. Implement condor logging level changes and document how the information should be used or extracted. (James/Igor?) - IGOR AGREED TO DO IT. DONE on submit-4.
  4. Document how to get a list of sites and users that ran on a pilot with a particular pilot certificate since a particular time (JAMES) - probably a complicated looking condor command.
  5. Who do we report incidents to? Oli? Ian? Any designated CMS Security contact person? Sites/users? - ASKED OLI: Ans. Ian and Mine.

-- JamesLetts - 2012/08/27

Edit | Attach | Print version | History: r7 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012/08/28 - 21:16:51 - JamesLetts
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback