Local system administration of Facilitized hosts
There are well over 1000 Unix/Linux workstations within the CMU School of Computer Science. In order to better manage an environment of this size, SCS Facilities has made several modifications and additions to the "standard" vendor software. This document describes the major changes that we have made to OS-level software, and how to make local modifications to this software on Facilities-supported Unix/Linux workstations. It also describes some Facilities policies for local system administration. People making configuration changes to their hosts, or installing network-aware software should also be aware of SCS network use policies. On this page:
- Danger!!! Danger!!! Danger!!!
- Getting Root Access
- Creating Accounts
- Security and Restricting Access
- Daemons & Services
- configuring KPOP
- The Mail System
- Updating Local Software: Depot & SUP
- AFS Issues and Problems
- Non-Facilitized Machines
- For Further Information
Introduction
DANGER!!! DANGER!!! DANGER!!!
Some of the procedures described in this document involve becoming root on your machine and modifying system configuration files. You can seriously damage your machine's functionality if you don't know what you are doing, or make a mistake. Unix often tends to be very unforgiving of typos or other "minor" errors (e.g. the classic mistake of typing rm * .o instead of rm *.o ). In addition, please do not make modifications that prevent Facilities staff from logging in to your workstation. In all cases, if you aren't sure of how to perform some procedure, you can contact SCS Facilities by sending e-mail to help@cs.cmu.edu or by calling the Help Center (x8-4231; 9-5, M-F) and we will either do it for you or help you do it yourself.Getting Root Access
If you need root access to your machine, contact SCS Facilities. The usual way we provide root access is to add an entry of the formusername.root@CS.CMU.EDUin the file .klogin.local in root's home directory (note: this is not necessarily the same as /). This entry will allow you to su using your username.root Kerberos instance password, and will also allow you to login as username.root. If you do not have a .root Kerberos instance, you can create one for yourself by using Jeeves If you forget your .root instance password, you will have to contact SCS Facilities to have it changed. Because the above method of root access depends on Kerberos, and thus upon the network working, you may want to have a local root password. You can set a local root password password by becoming root and using the command:
passwd -l rootFacilities generally does not make use of local root passwords, so feel free to set this password to whatever you wish (though please pick a secure password). You can bypass Kerberos authentication on login entirely by logging in as root:local This also works for other usernames. You can login as username -f to avoid running username's rc scripts upon login. Please do not change root's .klogin file, since that contains the necessary entries needed for Facilities staff members to access the machine. Also, please do not change root's home directory, since .klogin needs to be in root's home directory in order to enable remote access.
If you have a Linux host, you can boot it into single user mode by typing:
linux singleat the text lilo prompt (you may have to type a ^X to get a text prompt). If your machine asks for a local root password during boot (for example, because it needs fscking) and you don't know (or haven't set) a local root password, you may be able to boot directly to a shell by typing:
linux init=/bin/shat the lilo prompt.
Creating Accounts
If you are at all uncertain about the procedures involved in creating accounts on your machine, please contact SCS Facilities and we will do it for you. Facilities attempts to keep usernames and user IDs (ie the number that corresponds to a username) synchronized across machines. We do not do any synchronization of group IDs. If you are creating a local account for somebody who already has an SCS identity, you should use the correct username and UID. If a person has an AFS identity in the CS cell, you can find out their UID by typingpts examine <username>A master list of all SCS UIDs (including those that don't have AFS ID's for one reason or another) can be found in the file /afs/cs/data/admin/Names (UIDs are in the second field of lines in this file). When you create an account for an existing SCS user, also be sure to use the right full name (eg Harry Bovik) as it appears in the SCS white pages (you can find that out by doing a lookup username+ ) otherwise it's possible that the mail system could become confused if mail is sent directly to that user at the local machine. It is suggested that the password entry for existing SCS user accounts be a "*". This will allow Kerberos logins to the account, and will help head-off some of the security problems that local passwords can cause. Note that on some operating systems, you must modify other system files besides /etc/passwd and /etc/group to create an account (in particular, /etc/shadow on Solaris and recent versions of Redhat). If you have questions on how to create an account, please contact SCS Facilities.
If you are creating a local account for somebody who does not have an official SCS identity (such as a spouse or friend), you can use one of the UID's that we've set aside for this purpose: 1515, 1516, 1518, 1519, 1520. It is suggested that you choose a username for such accounts that is different from any existing SCS username since otherwise there may be login problems caused by, for example, having the same username as somebody whose Kerberos account has been expired (this problem may be avoided by logging in as username:local). Note that the mail system will deliver mail addressed to username@machine to the username in the global whitepages, if it exists, not the local user. As a result, if the username you picked gets allocated to a real SCS user, mail sent to that local account will not go there anymore.
If an account you create for a guest or friend (ie somebody who isn't already a SCS user) is abused in any way, you are responsible. SCS Facilities can provide little or no assistance to people who do not have valid SCS Kerberos identities, or in creating accounts for such people.
Security and Restricting Access
You should assume that our network is hostile, and that any traffic on it may be monitored by potential intruders. For this reason, you should use SSH or Kerberized telnet when connecting to machines. You should assume that if somebody can login to a machine that they can become root on it by exploiting some vulnerability (we install patches for all known remote exploits, but we may not install patches for all local-only exploits, and patch availability often lags well behind the most recent known exploits). For this reason, you should be aware that there is a risk in typing passwords at any machine that other people have accounts on. If you are an administrator for a group of machines, it is suggested that you give yourself root access on those machines by adding your root instance to root's .klogin.local. By su-ing on your local machine, and doing:telnet -ax remote-hostyou will be able to telnet in as root to machines you administer without typing a password on them.
There are several methods by which you can control access to services on a machine:
- To have a machine only accept encrypted (Kerberized) telnet connections, add the line:
telnetd_force_encrypt = on
to the file /etc/quirk.local (create the file if it doesn't already exist). Recently deployed hosts should have this feature turned on already (via an entry in /etc/quirk.system, which is a file that you should not modify yourself). If your host does not have mandatory telnet encryption enabled, it is suggested that you enable it. Entries in quirk.local can override entries in quirk.system. - All facilitized Unix machines run tcp wrappers (tcpd). The files /etc/hosts.allow and /etc/hosts.deny control the addresses that are allowed to connect to various services. See the man pages for tcpd(8) and hosts_access(5) for details on how to configure tcpd. Facilities does not deploy hosts with any default entries in hosts.deny or hosts.allow.
Daemons & Services
We have made many modifications to the standard set of services that system vendors ship by default:- telnetd has been replaced with a Kerberized version.
- ftpd (on most machines) has been replaced with a locally-modified version of wu-ftpd that uses username.ftp instance passwords, so that you can FTP to a machine without typing your Kerberos password in the clear. For information on configuring ftpd, see the man page for ftpd(8), the man page for ftpaccess(5) (the ftpaccess file lives in /usr/local/lib/ftpd) and our local ftpd documentation By default, anonymous FTP is disabled on deployed hosts, since the ftp user's home directory does not exist by default.
- rshd has been replaced with a Kerberized version (and a Kerberized version of rsh is in /usr/local/bin).
- fingerd is a local version that uses the same white pages information that the mail system does. It is not possible to separate finger lookups from your global mail forwarding information.
- named has been replaced by a local caching named.
- All machines run sshd.
- All machines run a NNTP server (which uses DNNTP to contact the real news servers). The file /etc/dnntp_access contains the list of hosts (one per line) that are allowed to have access to the NNTP server on that machine. You should use your local workstation in preference to dnntp.srv if you need NNTP access for a particular machine. If that machine is not a .cs, .ri, or .edrc host, then it will also need to be added to an access list one of our main news servers. Contact SCS Facilities to have that done.
- The vendor mail system has been totally replaced by an MMDF-based system. See the section below on mail for more details
- All machines run krcpd, which is a Kerberized copy program. Note that this means that any Kerberos-authenticated user on remote machines has read access to files readable by anon on any machine that runs krcpd. You can modify /etc/krcp.anon.allow to change who is allowed to anonymously krcp to a machine. See the krcpd man page for details.
- rlogind has been removed. Use Kerberized telnet or SSH instead.
- We have removed many of the other (often insecure) services that are sometimes found in the inetd.conf that is shipped by the OS vendor
- lcladmd, opshell, kopshell, and terad are services that we have added (depending on system type) for remote account administration and the backup system. Please do not remove these.
/usr/adm/fac/bin/newinetd.confServices that you add to /etc/inetd.conf.local will be added to /etc/inetd.conf . Services that you preface with a "!" in /etc/inetd.conf.local will be removed from /etc/inetd.conf . If you wish to make local changes to /etc/services , you can add these changes to the file /etc/services.local and run
/usr/adm/fac/bin/newservicesto merge the files. If you believe that a new entry in /etc/services would be of general use, then send mail to help@cs.cmu.edu and we will add it to the global services file.
Many services that are not run by inetd run under nanny. Nanny oversees a list of servers, and restarts them if they stop running for some reason. See the nanny man page for more information. Files specifying the servers that are run by nanny are located in /usr/local/lib/nannyconfig/.
Crontabs on facilitized systems are generated by a script in a similar manner to /etc/inetd.conf and services. If you wish to make local additions to root's crontab, you can add them to the file /usr/lib/crontab.local and then run:
/usr/adm/fac/bin/newcrontabto have the changes take effect.
Configuring KPOP
If the host is running one of the facilities-supported vendor-supplied UNIX operating systems, then it can be configured to run a KPOP daemon if needed. Typically, only general purpose and project servers are configured by default to run a KPOP daemon. To add KPOP to your system, copy the commented out "kpop" and "pop" (if using non-kerberized pop3) from /etc/inetd.conf.global to /etc/inetd.conf.local and remove the comment markers. See the instructions above to make this change take effect.The Mail System
The standard vendor mail system on facilitized machines has been replaced by a MMDF-based mail system. Some of the consequences of this change are:- The mail spool file format is different. Messages in a mail spool are separated by four ^A's at the beginning and end of each message. A result of this difference is that many standard vendor mail programs (like mail) won't work. A user may change his individual mail spool format by using lclmail and a .maildelivery file (see the on-line help pages and the maildelivery and lclmail man pages for more details).
- Mail delivery is based upon a global white pages database. Hence, mail addressed to user@local_workstation will be forwarded on to that user's global mail drop machine (if the user is in the whitepages), not the local spool file on local_workstation. Note that mail sent to root@local_workstation will automatically be forwarded to help@cs. There currently is no way to change this behavior.
- The mail system will not accept mail addressed to a user at an IP address instead of a hostname.
- .forward files do not work. You can get many of the same effects with a .maildelivery file. Be careful, since you can easily cause a mail loop with a .maildelivery file (since mail sent to your_username@any_facilitized_machine will just be forwarded back to your maildrop machine). Use Jeeves if you want to change your global e-mail forwarding.
- sendmail on facilitized machines is a symlink to /usr/local/lib/sendmail, which is a sendmail that comes with MMDF, not the vendor sendmail. This sendmail may not support all the options that the vendor sendmail does.
- You do not have to worry about the latest sendmail security exploit compromising your machine.
Mail relaying (injection of mail from a non-CS/RI/EDRC/ICES host that is addressed to a non-CS/RI/EDRC/ICES host) has been disabled by default in order to prevent our hosts from being used as spam relays. To change how relaying is handled (for example, to allowing mail relaying from a non-CMU host that you use), you can modify one or more of the files /usr/adm/mmdf/table/auth, /usr/adm/mmdf/table/myfriends, or /usr/adm/mmdf/table/myguests. Directions for how to modify each file is located within the file in question.
The MMDF mail system works by running multiple copies of /usr/local/etc/deliver, each one of which takes care of a particular channel. A channel corresponds to particular set of delivery modes (such as mail to the local host, mail to cmu hosts, mail to a list, etc). Copies of deliver run under nanny. You can run:
/usr/local/etc/nanny -statusto see channels that are running and their status. The mail system will do backoff, and retry within a few hours if it cannot at first deliver a message.
Occasionally, you may need to debug problems with the mail system. You can run:
/usr/local/bin/mailqor
/usr/local/bin/checkmailto see what mail is in the queue. To remove a message from the queue, run:
/usr/local/bin/checkmail -k message-idIf mail seems to be stuck in the mail queue for a long period of time, you can try the following:
- Look at the relevant mail.log file in /usr/adm/syslog (or syslog.dated depending on the OS) to see if there is some obvious error.
- Run
/usr/local/bin/checkque -h
to see what channels mail is stuck in. - Run
/usr/local/etc/nanny -status
to see if the mail channels are all running. Sometimes restarting a channel with/usr/local/etc/nanny -restart name-of-channel
will help. Occasionally, restarting named (via nanny) will fix a problem. - Run a deliver by hand for that channel by running
/usr/local/etc/deliver -w -c<name-of-channel>
This will often give an explicit error message to help in debugging, or immediately clear out the queue. Note that the channel names as given by nanny are different from the ones understood by deliver (e.g. cmuchan vs smtpcmu). You can also run additional delivers for a channel if the mail queue is very backed up, and you wish to process it faster. - If none of the above works, contact Facilities.
The mail system looks at the gecos field in /etc/passwd when deciding how to deliver mail to users not in the global white pages database. For example, if local_machine has an entry in /etc/passwd like:
abc:*:1515:100:Local abc mailbox:/usr0/abc:/local/bin/tcsh
then mail addressed to local_abc_mailbox@local_machine would be put in abc's mail spool on local_machine (it would not go to abc's global maildrop), unless there was a real user in the white pages with the name "Local abc mailbox". This feature of the mail system is useful when setting up local accounts that would like to receive mail on the local machine. You can also use it to set up a secondary maildrop. For example, suppose user abc, whose main maildrop is on UX7, wanted an additional copy of his/her mail sent to his/her local workstation. S/he could create a local passwd entry like the one above, and use a .maildelivery to resend abc's mail on UX7 to local_abc_mailbox@local_machine. One thing to be careful about: do not have the home directory for local abc mailbox be the same as the home directory for abc on UX7. If you do, then the .maildelivery will create a mail loop. Also, do not resend abc's mail to abc@local_machine since local_machine will send it right back to UX7 and create a mail loop.
Updating Local Software: Depot & SUP
Software on facilitized machines is automatically updated by two methods: depot and SUP. These programs are run nightly by /usr/local/bin/dosupdepot . Occasionally, a machine will fail to depot/SUP because of some problem (such as a full disk, or AFS problems). If you suspect that your machine is not getting software updates, you can look at /usr/adm/fac/log/depot.log to see when it last successfully ran dosupdepot. At any time, you can run dosupdepot by hand to, for example, force an update after you've subscribed a machine to a different release of a software collection. Automatic dosupdepot upgrades can be disabled entirely by creating a file called /etc/disableupgrade . (Note: don't do this ---you will not get security fixes and other important patches and software upgrades if you do). You can force a depot run even if /etc/disableupgrade exists by running dosupdepot with the -force option. Every machine has a particular release level of software that it is subscribed to by default (individual collection release levels may be overridden by entries in /usr/local/depot/depot.pref.local ). The machine-wide software release level is controlled by the file /etc/releaselevel . By default (if that file does not exist), the release level is omega. You can subscribe a machine to alpha or beta release levels of software by putting a single line reading alpha or beta in /etc/releaselevel .Depot controls the contents of /usr/local/, and does not modify any files outside of that directory. If you make any local modifications to the contents of /usr/local, you must modify local depot configuration files or your modifications may be lost. Information on how to make local changes to depot configuration files can be found in the on-line depot documentation. Note that by default depot does not remove old files from /usr/local that no longer have a corresponding AFS collection --- it only adds/replaces files. If you wish to run depot such that it removes such files, you can run
dosupdepot -depotargs '-vV-skipremove=false'Be careful, since there is no guarantee that doing so won't remove files you'd rather have kept. For most non-essential files, depot does not copy the file to your host, rather it makes symlinks into AFS. See the depot documentation for instructions on how to have depot copy files onto your local disk. Note that some recently deployed hosts have been configured to have some popular non-essential collections copied instead of linked. If you find yourself short of space on /usr, you may wish to check the contents of /usr/local/depot/depot.pref.local and perhaps free up (by making them non-local) some of the space that is being taken up by these local collections.
SUP is responsible for updating various system configuration files that are maintained by Facilities. A list of the facilities-provided files that have been upgraded by SUP can be found in the file last in a subdirectory of /usr/adm/fac/sup/. The exact path will depend on the OS you are running. For example, on Linux FC5 the full path is /usr/adm/fac/sup/fac.i386_fc5/last. If you wish to not have SUP upgrade a particular facilities-provided file you can create a file called refuse in the SUP directory (/usr/adm/fac/sup/fac.i386_fc5/ in the case of Linux FC5) and in that file put a list of files and directories (one per line, without the leading /) that should not be upgraded.
Important note: if you put down a refuse file, you will not get bug/security fixes for those files, and things may break. For example, if a new version of program A needs a new version of file B to work, and you refuse upgrades for B and not for A, then A will stop working. We recommend not refusing upgrades unless there is an overwhelming reason to do so. Facilities staff will not debug problems caused by refusing some portion of (or all) Facilities software updates.
AFS Issues and Problems
All fully-facilitized machines run the Andrew File System. AFS provides a wide-scale, shared file system with reasonable security features (unlike NFS). Most of the software that is run out of /usr/local is actually symlinked to AFS. The basic AFS configuration files are usually located in /usr/vice/etc/. The only file there that you may want to modify is /usr/vice/etc/cacheinfo. In particular, this file controls the default AFS cache size. You may wish to increase the size of your AFS cache if you will frequently access large files out of AFS. You should never make your AFS cache size so large that there is a risk that the partition it is on will run out of space. If that happens, unpredictable behavior may result.
The cacheinfo file also specifies the default location of the AFS cache, but on many machines the location it specifies is often a symlink to the real cache directory. Sometimes you may wish to move your vice cache to another partition. You should be sure to ascertain exactly where your vice cache is really located before changing its location (follow the symlinks). Do not remove currently-used cache files, as this can panic your machine. Instead, create the new cache directory, change the default cache location (either by changing the entry in cacheinfo or by having the location specified in cacheinfo be a symlink to the new location of the cache), and reboot your machine. Then remove the old cache directory.
Occasionally, the AFS cache on a machine may become corrupted. Symptoms of this problem include an inability to access certain files or directories in AFS, or being unable to run a binary located on AFS without an immediate core dump. If these symptoms occur, you should verify that it is a local problem (as opposed to an AFS server problem) by seeing if it occurs on other machines of that type, or comparing checksums for the binaries between the machine having the problem and other machines. You can use the command
/usr/local/bin/fs checkserversto check on the status of AFS servers that your local machine's cache manager has recently contacted. The command
fs checkvolumeswill check the status of alternate locations of volumes for replicated collections, and may fix some problems. If the problem is local to a particular machine, then it is very possibly caused by AFS cache corruption. There are a few things to try to fix the problem. If it is a single file or volume, you can run
fs flush path-to-fileor
fs flushv path-to-volumeAlternatively, you can try interactively reducing the cache size, and then increasing it back to the default. To do so, run
fs setcachesize 20 (or some other small number)wait until that command completes, and then run
fs setcachesize -resetto reset it to its original size. Sometimes, in cases of severe corruption, the above procedures may not the problem. In order to completely clear the cache, remove the CacheItems file in the cache directory and reboot (instead of rebooting, you could try manually stopping and starting AFS using the appropriate rc script and arguments, but that does not always work).
With very few exceptions all SCS hosts should be a member of the system:friendlyhost AFS group. If you have trouble accessing files from your machine that should be accessable by system:friendlyhost, contact Facilities, and we will add that machine to friendlyhosts. One consequence of being a friendlyhost is that, if you are running a webserver on your machine, you should not allow the server to access files in /afs/cs, as that would circumvent the purpose of the system:friendlyhost access controls.

