Set up a Server
This page described how to install CentOS and setup/troubleshooting puppet
Getting a Bootable USB stick
You can borrow one from the Sysadmin or DIY one (4.4GB+ storage) with instruction here
- Download the ISO
Rocky Linux Minimal : https://rockylinux.org/download/
Change Boot Order
1. Insert the USB stick and connect the monitor to the machine
2. Reboot the machine
3. Get to Boot Menu, there are a few ways:
a. Bring up the BIOS Menu by pressing Del button while the machine is booting. If that doesn't work, try F2 or F10
- In Boot, change the boot oder so that the USB get booted first
- Save changes and reboot
b. Press F11 and pick the USB drive
Install CentOS 7/Rocky Linux 8
Adopted from this guide -> https://phoenixnap.com/kb/how-to-install-centos-7
Select Test this media and install <OS>
Step 1 : Choose Keyboard and Language
Step 2 : Network Configuration
Select NETWORK & HOSTNAME
1. Switch on the Ethernet
2. Change Host name at the bottom
3. Select Configure
Select IPv4 Settings DNS Servers: [alpha private ip address] Search domains: cluster.ucsf.bkslab.org, ucsf.bkslab.org, bkslab.org, compbio.ucsf.edu, ucsf.edu Check "Require IPv4 addressing for this connection to complete". Save.
Step 3: Set Date and Time
Turn on Network Time and Select the local timezone.
Step 4: Partitioning
Select INSTALLATION DESTINATION.
Option 1: Automatic Partitioning
Under the Other Storage Options heading, select the Automatically configure partitioning checkbox. This ensures the selected destination storage disk will automatically partition with the /(root), /home and swap partitions. It will automatically create an LVM logical volume in the XFS file system.
If you do not have enough free space, you can reclaim disk space and instruct the system to delete files.
When finished, click the Done button.
Option 2: Manual Partitioning
Select the I will configure partitioning checkbox and choose Done.
If you want to use other file systems (such as ext4 and vfat) and a non-LVM partitioning scheme, such as btrfs. This will initiate a configuration pop-up where you can set up your partitioning manually.
Step 5: Software Selection
Select Compute Node on the left menu, then select Add-Ons on the right menu.
Step 6: Enable KDUMP
Double-check if KDUMP is enabled.
Step 7: Start installation Process
Hit Begin Installation
Step 8: Setup Root Password & User
During Installation, will see 2 items on top
Root Password
The usual one
User Creation
Create a local administrator account
User name : survival Check "Make this user administrator" Check "Require a password for this account" Password : [Hint it starts with G and has t somewhere in the middle]
'REBOOT when Installation is completed
Install Puppet and Create Puppet Certificate
Packages Installation
Login as root user
- Install EPEL release. EPEL is a repository for enterprise releases. Learn more
yum install epel-release -y This will install access to public repo on Epel. GPG key is provided to provide transaction is valid
- Update centos packages
yum update -y
- Install Puppet
yum install puppet -y
- Install sssd
yum install sssd -y
- Install perl libraries
yum install perl-DBD-Pg -y
- Install nss-pam-ldapd
yum install nss-pam-ldapd -y
yum install oddjob-mkhomedir -y systemctl start oddjobd systemctl enable oddjobd
Edit Puppet configuration on foreman.uscf.bkslab.org
- Search for host with it is existed.
- Edit Puppet setting
- If the machine is brand new, click on 'New Host', choose 'Testing' as Host Group and replicate the other existing desktop settings.
- In Parameters, click "Override" in "variant" and assign "cluster" as variable at the bottom.
- In Puppet class, Choose :
* nfs-mounts.* * ssd*
Issue new Puppet Certificate
In a second terminal, log in as root
- Log into alpha, to create new puppet certificate for the new computer
$ sudo puppet cert list -a | grep <hostname>.cluster.ucsf.bkslab.org //to list all of the current puppet certificates and check if there was an existing certificate for this machine
- To clean out existing certificate
$ sudo puppet cert clean <hostname>.cluster.ucsf.bkslab.org
BEFORE PROCEEDING TO THE NEXT STEP, MAKE SURE that you have 2 terminals on: one logged in as root on the new computer (client) and the other logged in as s_ on alpha (server) 1. On the client side:
$ puppet agent --test --waitforcert=10 "puppet agent --test" command initial integration with puppet for a new computer or reintegrate puppet. Without this command, the machine will not have access to the /mnt/nfs, /nfs/* and /nfs/soft "--waitforcert=10" means "keep calm, wait 10s for DNS server to respond"
2. On server (alpha) side:
Sign the certificate $ sudo puppet cert sign <hostname>.cluster.ucsf.bkslab.org
Testing puppet
$ id <user_name>
If failed, try running these commands and try it again:
$ systemctl restart sssd | systemctl enable sssd $ authcofig-tui This will prompt you to the authcofig-tui screen. User SpaceBar to change setting. 1. Uncheck "Use Shadow Password". 2. Uncheck "User Fingerprint reader" so that it would not raise any fingerprint error later. Click "Next' after. 3. Under "LDAP Settings", make sure it says: [*] User TLS Server: ldaps://ds.ucsf.bkslab.org/ Base DN: dc=bkslab, dc=org
$ systemctl start oddjobd $ systemctl enable oddjobd
GPU
Nouveau is the proprietary driver that is enable by default. In order to nvidia driver to work, nouveau must be disable How to know
$ lsmod | grep nouveau
How to disable nouveau
$ vim /etc/default/grub Append this line 'rd.driver.blacklist=nouveau nouveau.modeset=0' at the end of GRUB_CMDLINE_LINUX $ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak $ echo "blacklist nouveau" > /etc/modprobe.d/nouveau-blacklist.conf $ dracut /boot/initramfs-$(uname -r).img $(uname -r)
$ reboot
Troubleshooting
Puppet SSL issue
- Datetime mismatch
http://wiki.docking.org/index.php/Troubleshooting_-_Puppet_Failed_to_generate_additional_resources_using_%27eval_generate:_SSL_connect_returned%3D1%27
These are some issues from n-5-34/5 and the proposed solutions
- Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid tag "" on node
This error happens because puppet uses cache version of the node instead of creating new one. You must clean all trace of node on alpha before reissuing a new certification
[root@alpha tmp]# puppet node clean samekh.cluster.ucsf.bkslab.org
- To reissue Puppet on machine:
-revoke puppet certificate in alpha $ sudo puppet cert clean <hostname>.cluster.ucsf.bkslab.org -remove this directory $ rm -rf /var/lib/puppet/ssl
Other Issues
1. Network configuration (/etc/resolv.conf)
Issue 1 : DNS and nameserver are empty (Ethernet connection was not configured during installation)
What I did:
$ nmtui (NetworkManager tui) -Edit the connection by following the example from n-1-136
Issue 2: nameserver 127.0.0.1
What I did:
- Commented out all items in [main] section in /etc/NetworkManager/NetworkManager.conf - Change nameserver to 10.20.1.1 $ systemctl restart NetworkManager.service $ systemctl restart network
2. Yum not working (http://yum/centos/7/contrib/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found)
Issue: Puppet overwrote the existing Centos-Base.repo (Centos-7) with a Centos 6's Centos-Base.repo file
What I did:
- Overwritten /etc/yum.repos.d/CentOS-Base.repo with copy of the correct version from n-1-136
3. Machine not recognizing users Issue 1: sssd was not installed What I did:
$ yum install sssd $ systemctl start sssd $ systemctl enable sssd
Issue 2:
$ id s_khtang uid=xxxx(s_khtang) gid=1000(n-5-34) groups=1000(n-5-34)
This means the machine mistake sysadmin group 1000 for n-5-34
What I did:
$ vim /etc/group Change n-5-34:x:1000:n-5-34 to sysadmin:x:1000:n-5-34
$ authconfig-tui Uncheck 'Shadow Password'