Category:Sysadmin: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
These are pages for docking.org sysadmins, also useful for anyone who wants to install, run and manage a docking.org site.
These are pages for docking.org sysadmins, also useful for anyone who wants to install, run and manage a docking.org site.
Sysadmin pages are *only* relevant if you have sudo on a docking.org type cluster. For other roles, see [[:Category:Roles|here]].
Sysadmin pages are *only* relevant if you have sudo on a docking.org type cluster. For other roles, see [[:Category:Roles|here]].
Once the docking lab has been  [[So you want to set up a lab | set up ]], it must be maintained.
Once the docking lab has been  [[So you want to set up a lab | set up ]], it must be [[Periodic_system_maintenance |maintained]].
This guide covers the events we could think of that may occur after you have a computational pharmacology lab up and running.
This guide covers everything it takes to be a docking.org [[sysadmin]].


{{TOCright}}
{{TOCright}}

Revision as of 16:35, 31 March 2014

These are pages for docking.org sysadmins, also useful for anyone who wants to install, run and manage a docking.org site. Sysadmin pages are *only* relevant if you have sudo on a docking.org type cluster. For other roles, see here. Once the docking lab has been set up , it must be maintained. This guide covers everything it takes to be a docking.org sysadmin.

For security reasons, documents pertaining to authentication and access are kept in google docs

Google Docs docs

Some documents do not belong on the wiki. For access, contact the sysadmins.

  • Lab Software Status
  • Lab IP addresses
  • Sysadmin Secrets


There are two kinds of maintenance, reactive and pro-active. Pro-active maintenance is classified temporally, reactive is always in the present.

Pro-active

Reactive

  • Create a new user
  • Retire a user
  • RAID disk failure

System down/hung/crashed/offline

This section has two parts. In the first, Diagnosis, we enumerate the possible problems and what the symptoms might look like. In the second part, we rehearse scenarios of how to proceed. There are so many different kinds of failure that it is difficult to anticipate every one. Still, we have tried to write down the most common failure modes and sensible ways to proceed.

Diagnosis

  • system up but df hangs -> disk is off, hung, or unmounted. Solution ->
  • cannot ping head node.
  • no home directory
  • web server down or does not respond
  • jobs don't start in queuing system
  • disk full
  • kernel panic


Scenarios

  • Install new software by request

After power failure

  • check that mailman came back up properly
  • Cluster 0 - check that XML RPC services came back up properly
  • check on pipeline pilot server back up correctly.

When someone leaves the lab

  • back up their data or move to proust as appropriate
  • reduce disk footprint as much as possible
  • offer them portable USB disks for backups


  • Add new hardware to the cluster





Procedures

  • How to run backups
  • How to restore
  • How to set up a new computer
  • Monthly tasks
  • Security


Updating Software

  • Delphi
  • AMSOL
  • DOCK
  • dockenv
  • mol2db
  • molinspiration
  • OpenEye
  • Cactvs
  • Daylight
  • Marvin/JChem

Troubleshooting Services

  • MySQL
  • Perl
  • Apache, mod_perl
  • Python
  • Mailman
  • condor
  • sendmail

Subcategories

This category has only the following subcategory.

S

Pages in category "Sysadmin"

The following 151 pages are in this category, out of 151 total.

Z