Frequently Asked Questions



Condor C Submission

Specifying how many CPU you want your job to request



Site Readiness

Static Condor Information

Static Condor information updated every few minutes


Cacti Plots

Ganglia Plots

Ganglia Plots

Glidein Mon

Administrative Questions

How to generate a grid host cert

After backing up the old cert files run the following.

./globus/bin/grid-cert-request -host <hostname> 

How do I adjust the memory parameter on the kernel config line in a rocks CDROM kernel roll

The purpose of this fix is to allow Rocks to create very large raid partitions on 64bit machines.

Edit the following file:



label internal
       kernel vmlinuz
       append ramdisk_size=150000 initrd=initrd.img devfs=nomount ks
ksdevice=eth0 kssendmac selinux=0


label internal
       kernel vmlinuz
       append ramdisk_size=150000 initrd=initrd.img devfs=nomount ks
ksdevice=eth0 kssendmac selinux=0 mem=1024M

then rebuild the kernel roll:

   # cd rocks/src/roll/kernel
   # make roll 

Why does /dev not appear with the right files on RHEL4 in chroot?

/dev is not a directory but a mount.

mount -t tmpfs --bind /dev /sysroot/dev


Complete Listing of OSG Configuration Variables

A complete list of OSG Configuration variables

Retrieving ERT Times for the Site

Login to a gatekeeper, source the VDT and run

ldapsearch -xLLL -h -b mds-vo-name=UCSDT2,mds-vo-name=local,o=grid '(&(GlueCEAccessControlBaseRule=VO:cms)(GlueCEUniqueID=*))' GlueCEStateEstimatedResponseTime

How to Output the Text form of Grid Certificates (Host Cert)

openssl x509 -in cert.pem -text

Converting PEM x509 Format Files to P12(DER) for import into Mozilla/Firefox

Log into the machine on which you have you x509 PEM files for your cert.

cd ~/.globus
openssl pkcs12 -in foo.pem -inkey bar.pem -export -out foo.p12 

It will then ask for your password, this is the same password that you would use e.g. when you run voms-proxy-init. NewCert? .p12 can then be imported into your browser!

Entering a UCSD ACT Customer Service Request

Preserving the RSL file from submitted grid jobs

On the server side, edit $VDT_LOCATION/globus/etc/globus-job-manager.conf and set "-save-logfile always"

That should preserve the gram_job_mgr files in the user home dir for debugging.

Updating OSG CA Certificates


Run the following in $VDT_LOCATION

# pacman -update CA-Certificates

Fixing broken CRLs by hand

The CRLs come with the wn-client installation of OSG. This is exported from codefs to all the worker nodes. A CRL is a set of several files, one of which is updated via cron on codefs. The one that is updated has the ending .r0

Over Xmass 2009, we developed 3 CRLs that failed to update properly. They had zero filesize, and could no longer be overwritten by the CRL updater. It seems to refuse to update the zero size files. I thus had to copy by hand from the VDT client that's installed on the uaf the three .r0 files with zero filesize.

Once the files were copied, all worker nodes worked again.

Useful Grid Twiki for GRAM Errors

Installing Ubuntu on a Mac Mini

Using Grub

UCSD Testing and Monitoring Links

Links, tools and sites related to monitoring the UCSD Tier2.

UCSDT2 ITB RSV Monitoring Link

Query the ITB Ress and BDII

condor_status -pool -constraint 'GlueSiteName=="UCSDT2-ITB1"' -l 

ldapsearch -x -h -p 2170 -b mds-vo-name=UCSDT2-ITB1,mds-vo-name=local,o=grid   

Checking BDII publishing

Sites to check to see whether UCSD is properly reporting to the BDII?

SAM Test Page

CMS Prod Exit Code Results (CMS)

Job Robot Report

VORS Monitoring



SCRAM Error: It appears that the module "" is not installed. Please check your installaion. If you are an administrator, you can find the Perl Template Toolkit at or at the web site of the author (Andy Wardley):

Fix: Install perl-Template-Toolkit and supporting packages

Purging CE Jobs

To fully purge the CE of jobs you need to

  1. Remove or move the contents of the condor home area (eg. /state/data/condor_local)
  2. Remove or move the contents of the GRAM area $GLOBUS_LOCATION/tmp/gram_job_state/gram_condor_log.*

Installing the cert infrastructure only from the VDT

This will install the parts needed to request host certs as well as keep CRLs and CAs up to date on a machine.


mkdir -p /data/vdt
cd /data/vdt
tar zxvf pacman-3.25.tar.gz
chown root:root -R pacman-3.25
cd pacman-3.25
cd /data/vdt


pacman -pretend-platform:linux-rhel-4
pacman -get
pacman -get
pacman -get

Condor jobs cannot find /state/data/condor_local/execute/dir_XXXX

Due to remounting order is important, check to make sure all underlying file systems are mounted before the remounts.

WS Gram Performance Optimization

Resting priority factors on the Condor cluster

for i in `condor_userprio -all -allusers |grep "@" | awk -F"@" '{print $1}'|grep ligo`; do for j in `seq 2 5`; do condor_userprio -setfactor ${i}@osg-gw-${j} 100; done; done

Local Users Mappings


WS GRAM Errors

Error initializing GAHP

Check that Java is installed and the condor_config correctl points to its location

Additional CMS Config for OSG

Copy the following file from the old install to the new





Getting a slot wn-client environment on a node interactively

Log into a node as root

# chroot /chroot/cafuser1
# su - cafuser1
# source /code/osgcode/wn-client-itb/

Rocks Commands

Adding a cabinet to rocks

rocks add appliance cabinet-5 membership="Cabinet 5" short-name='c' node='cab5-compute'

OSG-RSV Commands at UCSD


$VDT_LOCATION/osg-rsv/setup/configure_osg_rsv --user rsv --init --server y  --ce-probes --ce-uri "" --srm-probes --srm-uri "" -srm-dir /pnfs/ --gridftp-probes  --gratia --grid-type "OSG"  --consumers --verbose --setup-for-apache --proxy /tmp/x509up_u59001


$VDT_LOCATION/osg-rsv/setup/configure_osg_rsv --user rsv --init --server y  --ce-probes --ce-uri "" --srm-probes --srm-uri "" -srm-dir /pnfs/ --gridftp-probes  --gratia --grid-type "OSG"  --consumers --verbose --setup-for-apache --proxy /tmp/x509up_u59001


Testing CA Cert Probe by hand

 su rsv -c "./cacert-crl-expiry-probe -m org.osg.certificates.cacert-expiry -u -x /tmp/x509up_u59001"

Gratia Search Links

Making the RAID devices on the nodes by hand

In the event you need to do this by hand

Create the partitions on the new disk

Stop the devices

mdadm --stop /dev/md0
mdadm --stop /dev/md1
mdadm --create /dev/md0 --chunk=256 --level=0 --raid-devices=4 /dev/sda2 /dev/sdb1 /dev/sdc1 /dev/sdd1
 mdadm --create /dev/md1 --chunk=256 --level=0 --raid-devices=4 /dev/sda5 /dev/sdb3 /dev/sdc3 /dev/sdd3

Make the file systems

mkfs.ext3 -i 16384 /dev/md0; mkfs.ext3 -i 16384 /dev/md1
tune2fs -m0 /dev/md0; tune2fs -m0 /dev/md1

Fixing a corrupt ext3 Journal

debugfs -w -R "feature ^has_journal,^needs_recovery" /dev/md2
fsck -y /dev/md2
tune2fs -j /dev/md2


debugfs -w -R "feature ^has_journal,^needs_recovery" /dev/md1 && fsck -y /dev/md1 && tune2fs -j /dev/md1

Bulk CA Certs for Web Browsers

TACAR keeps a repository of all the IGTF CAs. You can individually install the ones you care about directly in your browser (or try a bulk download and install)

SRM Ping

srm-ping srm://

VOMS Proxy and FTS


Error 1

Exception in thread "main" Mkdirs failed to create /cms/store/user/tmartin                        
        at org.apache.hadoop.fs.ChecksumFileSystem.create(                                 
        at org.apache.hadoop.fs.FileSystem.create(                                                 
        at org.apache.hadoop.fs.FileSystem.create(                                                 
Call to org.apache.hadoop.conf.FileSystem::create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;) failed!                 

Check to make sure the hadoop-site.xml is properly configured, or the CLASSPATH is set correctly.

Rocks Command Add Appliance at UCSD

rocks add appliance cabinet-5 membership="Cabinet 5" short-name='c' node='cab5-compute'
rocks add appliance cabinet-4 membership="Cabinet 4" short-name='c' node='cab4-compute'
rocks add appliance cabinet-6 membership="Cabinet 6" short-name='c' node='cab6-compute' 
rocks add appliance cabinet-7 membership="Cabinet 7" short-name='c' node='cab7-compute'

Memory copy of hadoop fsimage when restarting

First put hadoop into safe mode then run

hadoop dfsadmin -metasave

Checking for black hole nodes with condor


globus-job-run /bin/sh -c 'source 
$OSG_LOCATION/; condor_history  -constraint "RemoteWallClockTime 
< 120 && Owner == \"cmsprod\" && CurrentTime-EnteredCurrentStatus < 
3600*24*4" -format "%s\n" LastRemoteHost ' | sed 's/slot.*@//g' | sort | 
uniq -c | sort -r -n


condor_history  -constraint "RemoteWallClockTime 
< 120 && Owner == \"cmsprod\" && CurrentTime-EnteredCurrentStatus < 
3600*24*4" -format "%s\n" LastRemoteHost  | sed 's/slot.*@//g' | sort | 
uniq -c | sort -r -n

Hadoop mount is responding slow

This can be caused by the hadoop namenode getting stuck in a loop, this is often obvious when the hadoop namenode is sitting at around 100% of a single CPU under normal operating conditions, it should be much lower. After carefully checking the namespace backup restart the namenode.

If the mount process for fuse is at 100% then remount it. There is possibly a memory issue. Add to the file


export LIBHDFS_OPTS=-Xmx4096m

Remounting hadoop on UAF

umount -l /hadoop
mount /hadoop

New CRL check script

The following is the new CRL check script location and cron job on codefs. It will confirm all of the CRL are valid CRL files and force a re-run of the fetch-crl script of they are not.

root@codefs /code/osgcode/tmartin# cat /etc/cron.d/checkcrl 22 0,3,6,9,12,15,18,22 * * * root /code/osgcode/tmartin/ root@codefs /code/osgcode/tmartin#

--++ Adding a site to Glidein Factory

Log into the glide 1

Restarting the Glidein 1 Factory

When you you finished moving the stuff, remember to start (in this order):
1) the httpd (/etc/init.d as root)
2) Condor (/etc/init.d as root)
3) the gfactory (~/glideinsubmit/glidein_Productio3_1 as gfactory)


Converting P12 Certificates to x509


Setting up Putty with RSA keys

Glidein Factory FAQ


Kerberos Help for Mac Users

XrootD? Install

Gaining access to Grub prompt in DomU?

xm create -c domain

Useful SRM Commands


lcg-cp -v -b -D srmv2 file:/home/users/tmartin/ srm://

 srmcp -2 --debug=true -delegate=false srm:// file://localhost//tmp/


lcg-ls  -l -v -b -D srmv2 srm://


srmrm -2 -delegate=false srm://
Note: You need to run once per file, so you probably want to iterate over a list of files with a for loop in bash

Make a directory

srmmkdir -2 -delegate=false  srm://

Remove a directory

srmrmdir -2 -delegate=false  srm://

Creating a Xen instance

First copy the default config to the name of the instance you are creating. Edit as needed, Then run;

xm create -c devg-2 extra=" init 1 xencons=xvc0"

Tuning NFS Server

Number of threads

cat /proc/net/rpc/nfsd

fh 278562 0 0 0 0
io 4244458892 3311147501
th 96 72034 108604.717 46238.689 43326.894 1563.190 980.777 1653.901 105.886 73.850 58.314 102.676
ra 192 1422103166 0 0 0 0 0 0 0 0 0 2354117


  • How many threads you have
  • The number of times you have used the last thread
  • Count you have used between 1 and 12 threads (10%) for 108604.717 seconds
  • Count you have used between 13 and 25 threads for 46238.689 seconds

GUMS Manual Account mapping instructions for RSV

Very sorry about letting this lie dormant. I haven't yet sent instructions.

The plan is to add to the Maven-generated GUMS documentation a recipe
for doing the one-to-one user mappings. I'll make sure the info makes it
into the next release.

Briefly, the recipe would be:

1) Under User Groups, create a User Group for the user, e.g.
2) Under Manual User Group Members, add the intended user's DN and
optional FQAN and email to the group created in #1
3) Under Account Mappers, create a new Account Mapper, e.g.
JohnSmithMapper of type "manual" pointed at the UNIX account you want
John Smith to go to (e.g. jsmith).
4) Under Group To Account Mappings, create a Group To Account Mapping,
e.g. JohnSmithGTAMapping using user group from #1 and Account Mapper
from #3, defining VO accounting info.
5) Under Host To Group Mappings, create or edit a relevant host to group
mapping definition and include the GroupToAccount mapping from #4.

Note that where a name is defined, I've chosen a distinct name that
includes what kind of thing it is. In theory the namespace shouldn't
matter, but it makes what you're doing clearer.

Note also that this is all quite complicated to do on a per-user basis.
That is because GUMS was never designed or intended to do manual
per-user mapping. Rather it was intended to be a Grid-ID-to-UNIX-ID
*policy* tool where you handle a whole VO with one chain.

Hope this made sense.



Apache build modules

 apxs -I /usr/include/libxml -I . -i -c mod_proxy_html.c
 apxs -I /usr/include/libxml -I . -i -c mod_proxy_html.c
 apxs -I /usr/include/libxml -I . -i -c mod_xml2enc.c


export GLEXEC_CLIENT_CERT=/tmp/x509up_u583 
export X509_USER_PROXY=/tmp/x509up_u583
/usr/sbin/glexec /usr/bin/id

Grub Installing on second raid device

# grub
Probing devices to guess BIOS drives. This may take a long time.

    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> find /grub/stage1
find /grub/stage1
grub> device (hd0) /dev/sdb 
device (hd0) /dev/sdb 
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
grub> quit


New User Registration


Banning a user

Install certificate components and fetch-crl

First remove any existing soft links to old pacman certificate install and disable the cron based crl and cert updates for the pacman based install.

To install the OSG 3 CRL infrastucture

First remove the soft link for the old pacman based cert-inf install

rm /etc/grid-security/certificates

Then remove the cron jobs

cd /cert-inf
vdt-control --off

rpm -Uvh
rpm -Uvh

yum -y install osg-ca-certs
yum -y install fetch-crl
chkconfig fetch-crl-cron on
service fetch-crl-cron start

To grab the latest immediately run

To grab the latest immediately run

Remove the old pacman cert area

rm -rf /cert-inf


Mtest is a process that runs once an hour in Cron on the worker nodes to check for the hadoop mount. If it is not there mtest tries to remount the hadoop filesystem at /hadoop. The process creates a lot of logs in hadoop from the nodes that process the test.

Condor Requirements change

Example of changing a condor requirements line for all jobs.

condor_cron_qedit -const 'Owner=!=UNDEFINED' Requirements '( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" )'

Create a mirrored Logical Volume

Create the physical and volume groups as normal.

lvcreate -L 223G -m1 --mirrorlog mirrored --alloc anywhere -n osg-ce-2_vol vg_osg-ce-2

Installing Condor UAF Glidein

Install, in order:

  1. condor
  2. glideinwms-userschedd

Add in /etc/condor/config.d/99_local.config


Run security stuff

glidecondor_addDN -daemon "My own DN from hostcert" /etc/grid-security/hostcert.pem condor
glidecondor_addDN -daemon "The collector of the UAF pool" '/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/' coll

Disable the second schedd that is enable by default

glidecondor_createSecSched ""

Xrootd Testing

  xrdcp -d 2 -f root:// /dev/null
  xrdcp -d 2 -f root:// /dev/null
  xrdcp -d 2 -f root:// /dev/nul


-- TerrenceMartin - 3/17/2017
Topic revision: r98 - 2017/03/17 - 15:52:27 - TerrenceMartin
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback