Quick and easy SRM DCache Troubleshooting

Table of Contents

Restarting SRM server and all GridFTP? doors

Note that this is the most commonly needed procedure. If it is found that SRM transfers have started to fail, most likely the cause is either a malfunctioning SRM server itself, or one or more GridFTP? doors. Please login to t2data2.local and run the following script with 'restart' option. Don't worry about finding it, it is already in the path.

[root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh
Usage: restart-SRM-all-GridFTP-servers.sh {start|stop|restart}

Restarting all other dCache servers (except SRM and GridFTP? )

If the above step is taken, yet transfers still continue to fail AND if you are sufficiently confident that the cause is a malfunctioning lower-level dCache service (eg., PNFS server, Pool Manager, DCap server, Replica Manager, Broadcast cell, etc.), please login to t2data2.local and run the following script with 'restart' option. Don't worry about finding it, it is already in the path.

[root@t2data2:~]$ restart-all-other-dCache-servers.sh
Usage: restart-all-other-dCache-servers.sh {start|stop|restart}

Restarting everything -- all dCache servers, including SRM and GridFTP?

There is order to be maintained while restarting dCache services. Since a higher level service depends on the lower level services, the higher level service needs to be stopped before, and started later, than a lower level service. Each of the above mentioned two scripts take care of order internally. However, while using both scripts together, ie., to restart everything, one would run the scripts using different options and in the following sequence. Don't worry about finding these, they are already in the path.

[root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh stop
[root@t2data2:~]$ restart-all-other-dCache-servers.sh restart
[root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh start

Restarting dCache server(s) running on a known hostname

If you simply want to restart service(s) on a single known hostname, please login to t2data2.local and use the following script with 'a single short hostname' and 'restart' option. Don't worry about finding it, it is already in the path.

However, you should only do this if you know what you are doing!

Example cases:

  • restarting all DCap doors on "dcopy-1"
  • restarting a GridFTP? door on "gftp-1"
  • restarting Replica Manager services on "replica-1"

[root@t2data2:~]$ restart-one-dCache-server-host.sh
Usage: restart-one-dCache-server-host.sh {a single short hostname eg., dcopy-1 or gftp-1 or replica-1} {start|stop|restart}
An example: 
[root@t2data2:~]$ restart-one-dCache-server-host.sh gftp-1 restart

Mounting PNFS areas on uaf-1 or uaf-2

Two important PNFS areas are mounted on uaf-1 and uaf-2. These are:

  • /pnfs/sdsc.edu/data4/cms/userdata
  • /pnfs/sdsc.edu/data3/cms/phedex

If you discover that any or both areas are not mounted and want to verify, please login to uaf-1 or uaf-2 and run the following script with no option.

root@uaf-1 ~# ./PNFS-userdata-mounter
PNFS userdata is ALREADY mounted.
If needed, use --umount to unmount.

If you need to unmount, please run with '--umount' option.

root@uaf-1 ~# ./PNFS-userdata-mounter --umount
--umount

If you need to verify again, please run with no option.

root@uaf-1 ~# ./PNFS-userdata-mounter
PNFS userdata is NOT mounted.
If needed, use --mount to mount.

If you need to mount, please run with '--mount' option.

root@uaf-1 ~# ./PNFS-userdata-mounter --mount
--mount
root@uaf-1 ~# ./PNFS-userdata-mounter
PNFS userdata is ALREADY mounted.
If needed, use --umount to unmount.

Mounting PNFS areas on phedex-1, the PhEDEx? node

We can distinguish between our 2 SEs in this case.

To mount the area from the dCache 1.7/ Production SE, please issue the following.

mount 192.168.65.3:/fs/usr/data4/cms/userdata /pnfs/sdsc.edu/data4/cms/userdata
mount 192.168.65.3:/fs/usr/data3/cms/phedex /pnfs/sdsc.edu/data3/cms/phedex

To mount the areas from the dCache 1.8/SRM v 2.2 Testbed SE, please issue the following.

mount 192.168.4.251:/fs/usr/data/phedex /pnfs/t2.ucsd.edu/data/phedex

Maximizing data availability - Pool bootstrap messages in email

There is a Bootstrap process that runs on every pool and monitors its status. If a pool is found alive, there is no action taken. However, if a pool is found offline, then it is auto-restarted ie., 'bootstrapped' and an email is sent to relevant persons. It is useful to simply note the Subject header of such emails. If the Subject says "... All's well", it implies that restart went normal and (most likely) no manual action is needed. If the Subject says "... Failed", then further manual intervention on this pool can be deemed necessary.

Examples:

  • Subject: dCache pool report from cabinet-6-6-11 - All's well. 
  • Subject: dCache pool report from cabinet-6-6-22 - Failed. 

Maximizing GridFTP? stability - Door bootstrap messages in email

There is a Bootstrap process that runs on every GridFTP? door and monitors its status. If a GridFTP? door is found alive as well as under normal load, there is no action taken. However, if the door is overloaded or otherwise found offline, then it is auto-restarted ie., 'bootstrapped' and an email is sent to relevant persons. It is useful to simply note the Subject header of such emails. If the Subject says "... All's well", it implies that restart went normal and (most likely) no manual action is needed. If the Subject says "... Failed", then further manual intervention on this pool can be deemed necessary. Reason for this bootstrapping action is also listed in the Subject itself, as either "was dead" or "was overloaded". If applicable, the high threadcount is specified in the body of the email.

Examples:

  • Subject: dCache GridFTP report from gftp-6, was dead, All's well.
  • Subject: dCache GridFTP report from gftp-4, was overloaded, All's well. 
  • Subject: dCache GridFTP report from gftp-2, was dead, Failed. 

-- AsRana - 14 May 2007

Topic revision: r6 - 2007/09/07 - 23:14:17 - AsRana
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback