Difference: DCacheLink (4 vs. 5)

Revision 52007/05/18 - Main.AsRana

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Quick and easy SRM DCache Troubleshooting

Line: 8 to 8
 

Restarting SRM server and all GridFTP? doors

Changed:
<
<
Note that this is the most commonly needed procedure. If it is found that SRM transfers have started to fail, most likely the cause is either a malfunctioning SRM server itself, or one or more GridFTP? doors. Please login to t2data2.local and run the following script with 'restart' option.
>
>
Note that this is the most commonly needed procedure. If it is found that SRM transfers have started to fail, most likely the cause is either a malfunctioning SRM server itself, or one or more GridFTP? doors. Please login to t2data2.local and run the following script with 'restart' option. Don't worry about finding it, it is already in the path.
 
Changed:
<
<
[root@t2data2:~]$ ./restart-SRM-all-GridFTP-servers.sh Usage: ./restart-SRM-all-GridFTP-servers.sh {start|stop|restart}
>
>
[root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh Usage: restart-SRM-all-GridFTP-servers.sh {start|stop|restart}
 

Restarting all other dCache servers (except SRM and GridFTP? )

Changed:
<
<
If the above step is taken, yet transfers still continue to fail AND if you are sufficiently confident that the cause is a malfunctioning lower-level dCache service (eg., PNFS server, Pool Manager, DCap server, Replica Manager, Broadcast cell, etc.), please login to t2data2.local and run the following script with 'restart' option.
>
>
If the above step is taken, yet transfers still continue to fail AND if you are sufficiently confident that the cause is a malfunctioning lower-level dCache service (eg., PNFS server, Pool Manager, DCap server, Replica Manager, Broadcast cell, etc.), please login to t2data2.local and run the following script with 'restart' option. Don't worry about finding it, it is already in the path.
 
Changed:
<
<
[root@t2data2:~]$ ./restart-all-other-dCache-servers.sh Usage: ./restart-all-other-dCache-servers.sh {start|stop|restart}
>
>
[root@t2data2:~]$ restart-all-other-dCache-servers.sh Usage: restart-all-other-dCache-servers.sh {start|stop|restart}
 

Restarting everything -- all dCache servers, including SRM and GridFTP?

Changed:
<
<
There is order to be maintained while restarting dCache services. Since a higher level service depends on the lower level services, the higher level service needs to be stopped before, and needs to be started later, than a lower level service. Each of the above mentioned two scripts take care of order internally. However, while using both scripts together, ie., to restart everything, one would run the scripts using different options and in the following sequence.
>
>
There is order to be maintained while restarting dCache services. Since a higher level service depends on the lower level services, the higher level service needs to be stopped before, and started later, than a lower level service. Each of the above mentioned two scripts take care of order internally. However, while using both scripts together, ie., to restart everything, one would run the scripts using different options and in the following sequence. Don't worry about finding these, they are already in the path.
 
Changed:
<
<
[root@t2data2:~]$ ./restart-SRM-all-GridFTP-servers.sh stop [root@t2data2:~]$ ./restart-all-other-dCache-servers.sh restart [root@t2data2:~]$ ./restart-SRM-all-GridFTP-servers.sh start
>
>
[root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh stop [root@t2data2:~]$ restart-all-other-dCache-servers.sh restart [root@t2data2:~]$ restart-SRM-all-GridFTP-servers.sh start
 

Restarting dCache server(s) running on a known hostname

Changed:
<
<
If you simply want to restart service(s) on a single known hostname, please login to t2data2.local and use the following script with 'a single short hostname' and 'restart' option.
>
>
If you simply want to restart service(s) on a single known hostname, please login to t2data2.local and use the following script with 'a single short hostname' and 'restart' option. Don't worry about finding it, it is already in the path.

However, you should only do this if you know what you are doing!

 Example cases:
  • restarting all DCap doors on "dcopy-1"
  • restarting a GridFTP? door on "gftp-1"
  • restarting Replica Manager services on "replica-1"
Changed:
<
<
[root@t2data2:~]$ ./restart-one-dCache-server-host.sh Usage: ./restart-one-dCache-server-host.sh {a single short hostname eg., dcopy-1 or gftp-1 or replica-1} {start|stop|restart}
>
>
[root@t2data2:~]$ restart-one-dCache-server-host.sh Usage: restart-one-dCache-server-host.sh {a single short hostname eg., dcopy-1 or gftp-1 or replica-1} {start|stop|restart}
 An example:
Changed:
<
<
[root@t2data2:~]$ ./restart-one-dCache-server-host.sh gftp-1 restart
>
>
[root@t2data2:~]$ restart-one-dCache-server-host.sh gftp-1 restart
 

Mounting PNFS areas on uaf-1 or uaf-2

Line: 87 to 90
 

Maximizing data availability - Pool bootstrap messages in email

Changed:
<
<
There is a Bootstrap process that runs on every pool and monitors its status. If the pool is found to be online, there is no action taken. However, if the pool is found offline, then it is auto-restarted ie., 'bootstrapped' and an email is sent to relevant persons. It is useful to simply note the Subject header of such emails. If the Subject says "... All's well", it implies that restart went normal and (most likely) no manual action is needed. If the Subject says "... Failed", then further manual intervention on this pool can be deemed necessary.
>
>
There is a Bootstrap process that runs on every pool and monitors its status. If a pool is found alive, there is no action taken. However, if a pool is found offline, then it is auto-restarted ie., 'bootstrapped' and an email is sent to relevant persons. It is useful to simply note the Subject header of such emails. If the Subject says "... All's well", it implies that restart went normal and (most likely) no manual action is needed. If the Subject says "... Failed", then further manual intervention on this pool can be deemed necessary.
  Examples:
  • Subject: dCache pool report from cabinet-6-6-11 - All's well. 
  • Subject: dCache pool report from cabinet-6-6-22 - Failed. 
Added:
>
>

Maximizing GridFTP? stability - Door bootstrap messages in email

There is a Bootstrap process that runs on every GridFTP? door and monitors its status. If a GridFTP? door is found alive as well as under normal load, there is no action taken. However, if the door is overloaded or otherwise found offline, then it is auto-restarted ie., 'bootstrapped' and an email is sent to relevant persons. It is useful to simply note the Subject header of such emails. If the Subject says "... All's well", it implies that restart went normal and (most likely) no manual action is needed. If the Subject says "... Failed", then further manual intervention on this pool can be deemed necessary. Reason for this bootstrapping action is also listed in the Subject itself, as either "was dead" or "was overloaded". If applicable, the high threadcount is specified in the body of the email.

Examples:

  • Subject: dCache GridFTP report from gftp-6, was dead, All's well.
  • Subject: dCache GridFTP report from gftp-4, was overloaded, All's well. 
  • Subject: dCache GridFTP report from gftp-2, was dead, Failed. 
  -- AsRana - 14 May 2007
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback