The text of and illustrations in this document are licensed by Predrag Punosesvac under a Creative Commons Attribution-Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. The original author of this document, designate the Auton Lab as the "Attribution Party" for purposes of CC-BY-SA. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
A version control system (VCS), continuous integration (CI), and regression testing tools are fundamental software development infrastructure building blocks of any software company. A VCS is the software used to manage and track changes to computer programs. CI is the practice of merging all developer working copies to a shared software project and making sure that this merges don't brake builds. Finally regression testing tools are used to identify possible unexpected consequences to the existing code base when changes of modifications are committed.
The Auton Lab, part of Carnegie Mellon University's School of Computer Science, researches new approaches to Statistical Data Mining. We are not a software company but writing lots of useful software is byproduct of our research effort. Interestingly the code we write which usually starts as a research-grade software ends up being a code base for a licensed enterprise application software (EAS) used by many government agencies and private entities.
Historically the first version control system used in the Auton Lab was the Concurrent Version System (CVS). An unsuccessful attempt to migrate to Subversion in 2008 failed due to the tight integration of our custom Gmake-Magic build scripts with CVS. Aside of inability of CVS to deal with atomic commits over the past several years existence of GitHub has completely revolutionized the way developers work together and was a major factor to the adoption of Git distributed version control system. Over the years we noticed that our students and research programmers alike started avoiding the use of our CVS server and created private Git repositories all over our NFS shares. We also noticed disturbing trend of putting peaces of our proprietary code base to the GitHub or to much lesser extend Bitbucket. The decision was made that we move aggressively towards adoption of Git for our version control needs and try to emulate GitHub model privately in our internal environment.
This short article is not about technical merits of Git comparing to other version control systems. It also doesn't address philosophical and practical differences in using Git for software development in comparison with CVS. It is merely a set of our notes and how-to which will help you build similar development environment in your small academic lab or a start up company.
Operating system diversity has adverse effect on system administration productivity and user satisfaction. At the time of my arrival at the Auton Lab in June of 2013 there were only forensic evidence that FreeBSD was used in the laboratory and the Linux was ruling the computing landscape. However tha lab was orphaned for almost two years prior to my arrival so there was a great opportunity to rethink some of the solutions. By training I am a research mathematician whose only major exposure to system administration got through my life-long UNIX hobby. It was way easier to rebuild our entire network infrastructure (Firewalls, VPNs, DNSs, LDAPs) using OpenBSD which I use every day at home than to try to figure out how those things work on Linux. My second decession was to consolidate various flavors of Linux which we used for our scientific computing and desktop needs to a Springdale Linux, Princeton University clone of Red Hat Enterprise Linux. That was all easy. Most difficult decession was selection of storage OS. Namely we are dealing with lots of big data. Did I say a lot? Red Hat is quite happy begin storage OS either using Hardware RAID or Software RAID. It would have being much easier to stick to Hardware RAID and Linux as a storage solution and not introduce the third OS into the lab but the lack of modern file system on Linux has really bother us. Namely XFS is robust but its age feels and the only modern options are ZFS and HAMMER. We tested both and decided that at this point adopt ZFS (software RAID) for our storage needs. During all this time OpenBSD was used as a CVS server.
Upon completion migration of all our data to ZFS storage pool I started experimenting with jails on the top of ZFS and iocage in particular and I really liked what I saw. We also wanted to take advantage of ZFS in creating our new software development environment which will revolve around Git
But before we go any further let me tell you why we picked up PC-BSD (TrueOS server) over vanilla FreeBSD for our original installations.
At the time of adoption PC-BSD (TrueOS as it server version was called) had several major advantages over newly released FreeBSD 10.0:
With the release of FreeBSD 11.0 and uncertain future of TrueOS based of 12.0 current (even worse no updates for PC-BSD 10 branch release) we are re-evaluating situation.
FreeBSD 11.0 is now fully capable of booting from ZFS mirror. We ended up not taking full advantage of pc-sysinstall. beadm is usable on vanilla FreeBSD and even TrueOS abandoned Grub as its boot loader. Update/upgrade manager is worthless if there are no new build in PC-BSD repos. Life Preserver turns out to be buggy and unusable so we went the way of zfsnap. Warden was unexpectedly dropped in favor of iocage which in turn became abandonware when the original developer started rewriting it in different language.
We still believe that TrueOS has a saner defaults (LibreSSL instead of OpenSSL being one) but we were bitten by PC-BSD custom configuration files multiple times.
As promised earlier in this section we will describe how we build our infrastructure using PC-BSD, ZFS, jails, and iocage.
A FreeBSD jail is a glorified chroot(2), change root directory. This creates a "safe" environment, separate from the rest of the system at least from the system administration prospective if not from the security point of view. A similar FreeBSD jail inspired user-land virtualiser called sysjail was attempted by OpenBSD and NetBSD but abandoned due to the discovery of the fundamental security flaws in the design of systrace library. FreeBSD jails are not free of security flaws but are relatively safe and cheap (comparing to full blown OS virtualization) ways to separate and run multiple applications from the same hardware host. Jails alone probably would not have being sufficient reason for us to use them to run Git server but combining it with ZFS was offering advantage over any other available solutions.
The advantage of using ZFS datasets for jails is pretty obvious for a seasoned ZFS users. ZFS is both the logical volume manager and the file system in one. As a logical volume manager ZFS offers some measure of data protection and high availability from simple hardware failures like dead hard drives. As a file system it includes protection against data corruption but most importantly from the end user point of view offers ability to automatically rollback recent changes to the file system and data via snapshots. It is also easy to backup using remote replication. Using ZFS for hot migration of guest to a different server and creating failover mirrors is also pretty straightforward. Finally ZFS cloning features offer safe and painless way to play and upgrade complex software.
This step was needed only before the Warden abandoned:
pkg install iocage
In the next step we tell iocage to download an image of FreeBSD
iocage fetch Supported releases are: 10.2-RELEASE 9.3-RELEASE Please select a release [10.3-RELEASE]:
We are no ready to create a jail which will host our Git repositories
iocage create tag=git.int.autonlab.org ip4_addr="igb0|192.168.6.23"
before we start jail we want to set some properties
iocage set hostname=git.int.autonlab.org git.int.autonlab.org
and to bypass some known issues with long jail names which is very important if you are going to recover files from ZFS snapshots
iocage set hack88=1 git.int.autonlab.org
To allow iocage to start your jails at boot time
echo 'iocage_enable="YES"' >> /etc/rc.conf
start jail automatically on the boot
iocage set boot=on git.int.autonlab.org
Since we run multiple jails on this host we can prioritize
iocage set priority=1 git.int.autonlab.org
We are finally ready to fire up our jail and start working on Git server
iocage start git.int.autonlab.org
To access the host we do simply
iocage console git.int.autonlab.org
Just add the package
pkg install git
and add the user
pw useradd git -m
Our git repository will be in the /home/git
. This choice will become
obvious once it became clear that we want to recreate private GitHub
installation. At this point we could create repos with
su - git git init
but please don't do that. That would only complicate letter integration with web interface.
As we said in the introduction GitHub is an amazing platform but comes with a caveat. It can't be self hosted. It is true that you can have a private repositories but in our case that is not an option. Putting cutting edge research code used by some of the major U.S. agencies on the third party server is a quick one way ticket to a federal prison.
Browsing Git repositories on the web is not very tall order. What we wanted in the Auton Lab is to have an integrated bug tracking, wiki, and possibly continuous integration. All that is possible with real GitHub.
There are two GitHub alternatives which can be self-hosted. GitLab and Gogs. We tested both. GitLab essentially requires installation of all Ruby gems and uses multiple data bases one of which is Redis. Not something that anybody in our lab is familiar with. Our first attempt to install and configure GitLab which was not in the ports three at the time due to the multiple issues made me cry. I spent several days and gave up. We tested a TurnKey Linux image and actually worked pretty well but was not light on resources. It seems at the time to be more mature project than Gogs which is written in Go. However Gogs is significantly simpler to install and configure. On the other hand GitLab also has better integration with Jenkins. After careful consideration and talking to Gogs developers we decided to go that route.
We will install firstly some software packages. Our design choices reflect the pallet of software we are installing. We also have no special needs so no compiling from the ports in necessary. However upgrading packages and repositories is desirable
pkg upgrade pkg install go git gcc postgresql94-client postgresql94-server nginx
Running PostgreSQL in a jail is tricky. There are shared memory issues which should have being resolved by setting
echo 'jail_sysvipc_allow="YES"' >> /etc/rc.conf echo 'security.jail.sysvipc_allowed=1' >> /etc/sysctl.conf
on the host machine or possibly adding
echo 'security.jail.sysvipc_allowed=1' >> /etc/sysctl.conf
in the jail itself. I could not get this to work even putting additional
echo 'jail_example_parameters="allow.sysvipc=1"'>> /etc/rc.conf
on the host machine. Finally I gave up and ended up doing
jail -m jid=3 allow.sysvipc=1
on the jail host where jid=3
is should match the jail id. We are
now ready to run PostgreSQL in the jail
To have PostgreSQL startup when the server does you need to add postgresql_enable="YES" to the /etc/rc.conf file:
echo 'postgresql_enable="YES"' >> /etc/rc.conf
Before we start using PostgreSQL we need to initialize the database. By default the cluster will be placed in a data folder in /var/db/postgres directory.
service postgresql initdb
One should see the output like this
The files belonging to this database system will be owned by user "pgsql". This user must also own the server process. The database cluster will be initialized with locale "C". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /usr/local/pgsql/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok creating template1 database in /usr/local/pgsql/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok syncing data to disk ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /usr/local/bin/postgres -D /usr/local/pgsql/data or /usr/local/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
Before we start the database we edit /usr/local/pgsql/data/pg_hba.conf
# PostgreSQL Client Authentication Configuration File # =================================================== # # Refer to the "Client Authentication" section in the PostgreSQL # documentation for a complete description of this file. A short # synopsis follows. # # This file controls: which hosts are allowed to connect, how clients # are authenticated, which PostgreSQL user names they can use, which # databases they can access. Records take one of these forms: # # local DATABASE USER METHOD [OPTIONS] # host DATABASE USER ADDRESS METHOD [OPTIONS] host all all 192.168.6.23 255.255.255.255 trust # hostssl DATABASE USER ADDRESS METHOD [OPTIONS] # hostnossl DATABASE USER ADDRESS METHOD [OPTIONS] # # (The uppercase items must be replaced by actual values.) # # The first field is the connection type: "local" is a Unix-domain # socket, "host" is either a plain or SSL-encrypted TCP/IP socket, # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a # plain TCP/IP socket. # # DATABASE can be "all", "sameuser", "samerole", "replication", a # database name, or a comma-separated list thereof. The "all" # keyword does not match "replication". Access to replication # must be enabled in a separate record (see example below). # # USER can be "all", a user name, a group name prefixed with "+", or a # comma-separated list thereof. In both the DATABASE and USER fields # you can also write a file name prefixed with "@" to include names # from a separate file. # # ADDRESS specifies the set of hosts the record matches. It can be a # host name, or it is made up of an IP address and a CIDR mask that is # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that # specifies the number of significant bits in the mask. A host name # that starts with a dot (.) matches a suffix of the actual host name. # Alternatively, you can write an IP address and netmask in separate # columns to specify the set of hosts. Instead of a CIDR-address, you # can write "samehost" to match any of the server's own IP addresses, # or "samenet" to match any address in any subnet that the server is # directly connected to. # # METHOD can be "trust", "reject", "md5", "password", "gss", "sspi", # "ident", "peer", "pam", "ldap", "radius" or "cert". Note that # "password" sends passwords in clear text; "md5" is preferred since # it sends encrypted passwords. # # OPTIONS are a set of options for the authentication in the format # NAME=VALUE. The available options depend on the different # authentication methods -- refer to the "Client Authentication" # section in the documentation for a list of which options are # available for which authentication methods. # # Database and user names containing spaces, commas, quotes and other # special characters must be quoted. Quoting one of the keywords # "all", "sameuser", "samerole" or "replication" makes the name lose # its special character, and just match a database or username with # that name. # # This file is read on server startup and when the postmaster receives # a SIGHUP signal. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect. You can # use "pg_ctl reload" to do that. # Put your actual configuration here # ---------------------------------- # # If you want to allow non-local connections, you need to add more # "host" records. In that case you will also need to make PostgreSQL # listen on a non-local interface via the listen_addresses # configuration parameter, or via the -i or -h command line switches. # CAUTION: Configuring the system for local "trust" authentication # allows any local user to connect as any PostgreSQL user, including # the database superuser. If you do not trust all your local users, # use another authentication method. # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 127.0.0.1/32 trust # IPv6 local connections: host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. #local replication pgsql trust #host replication pgsql 127.0.0.1/32 trust #host replication pgsql ::1/128 trust
We log into the pgsql account created when we installed PostgreSQL.
su - pgsql
in order to create PostgreSQL user account and database which will be used by Gogs. Connect to Postgres database
$ psql -h localhost -p 5432 -d template1
When logged into the database:
$ psql -h localhost -p 5432 -d template1 psql (9.4.9) Type "help" for help. # Create a user for Gogs template1=# CREATE USER git CREATEDB; CREATE ROLE # Create the Gogs production database & grant all privileges on database template1=# CREATE DATABASE gogs_production OWNER git encoding='UTF8'; CREATE DATABASE # Quit the database session template1=# \q
Then type exit to drop back to the root user. Try connecting to the new database with the git user and set the password:
su - git $ psql -d gogs_production psql (9.4.9) Type "help" for help. gogs_production=> ALTER USER git PASSWORD 'yEfa8pes'; ALTER ROLE gogs_production=> \q
Gogs still didn't hit the ports and it is under very active current development. We will compile Gogs from the sources. The fact that we opted out for PostgreSQL instead SQLite is complicating our installation somewhat but has also certain benefits.
Firstly we setup the Go environment (locally and permanently):
GOPATH=$HOME/go; export GOPATH echo 'GOPATH=$HOME/go; export GOPATH' >> ~/.profile
We now fetch Gogs sources using the Go package manager. Notice that
CC=gcc
flag is required to force the build tool to use GCC instead
of LLVM system compiler (you can use CC=gcc54
if you want to try
with the newer version of GCC). I have not attempted building Gogs with
LLVM.
CC=gcc go get -u --tags postgresql github.com/gogits/gogs
Note that my tag reflects my choice of PostgreSQL for our database.
For most small installations with privately accessible Gogs server
sqlite
will be sufficient. PostgreSQL does offer some measure of
protection for public instances.
Note that the new directory is created in /home/git
with the name
go
. The directory is pretty deep so I decided to create a symbolic
link
ln -s go/src/github.com/gogits/gogs gogs
Building Gogs is now as easy as
cd gogs CC=gcc go build --tags postgresql
The default configuration file is
/home/git/gogs/conf/app.ini
We do not want to alter that file so we extend it by creating custom
custom/conf/app.ini
.
mkdir -p custom/conf vi custom/conf/app.ini
Note that if we don't create the file the first time we run gogs web web interface will run installation script which will populate the file by reading our input from the web interface. We prefer to this is how our configuration file looks like
APP_NAME = Auton Lab Go Git Service RUN_USER = git RUN_MODE = prod [database] DB_TYPE = postgres HOST = 127.0.0.1:5432 NAME = gogs_production USER = git PASSWD = yEfa8pes SSL_MODE = disable PATH = data/gogs.db [repository] ROOT = /home/git/gogs-repositories [server] DOMAIN = git.int.autonlab.org HTTP_PORT = 3000 ROOT_URL = http://git.int.autonlab.org/ DISABLE_SSH = false SSH_PORT = 2222 OFFLINE_MODE = false [mailer] ENABLED = true ; Buffer length of channel, keep it as it is if you don't know what it is. SEND_BUFFER_LEN = 100 ; Name displayed in mail title SUBJECT = %(APP_NAME)s ; Mail server HOST = smtp.autonlab.org:587 ; Disable HELO operation when hostname are different. DISABLE_HELO = true ; Mail from address, RFC 5322. This can be just an email address, or the `"Name" <email@example.com>` format FROM = `"Predrag Punosevac" <gogsmaster@autonlab.org>` ; Mailer user name and password USER = gogsmaster PASSWD = XXXXXXXXX [service] REGISTER_EMAIL_CONFIRM = false ENABLE_NOTIFY_MAIL = true DISABLE_REGISTRATION = false ENABLE_CAPTCHA = false REQUIRE_SIGNIN_VIEW = true [picture] DISABLE_GRAVATAR = false [session] PROVIDER = file [log] MODE = file LEVEL = Info ROOT_PATH = /usr/home/git/go/src/github.com/gogits/gogs/log [security] INSTALL_LOCK = true SECRET_KEY = mPa644nQpSIMqfm
The option DISABLE_REGISTRATION = false
should not be changed to
DISABLE_REGISTRATION = true
until we create at least one user
account. The first user account will automatically have administrative
privileges. For the production purposes in my organization we don't
allow registration.
Since Gogs is not in ports and I didn't want to create daemons we decide
simply to use tmux
to start Gogs manually
cd /home/git/gogs tmux ./gogs web
Then detach your tmux
session
[detached (from session 0)]
Here small description how you create gogs-repos. Earlier I mentioned
that it is easier to create git repos from Gogs interface than using
git init
and then import it into the Gogs.
Backing up and recovering ZFS datasets is pretty easy. This was a major reason we opted to run our Git and Gogs services in FreeBSD jail.
Obviously the think which care the most is our code. Our code is stored
in /home/git/gogs-repositories
and simple
iocage snapshot git.int.autonlab.org
of our jail instance will be sufficient to create a snapshot which can be rolled back or send with ZFS send command. On the another hand the user info is stored in PostgreSQL database so it is easily to recreate it with a medium size organization like ours but why waist the time. We dump the PostgreSQL bia cron just before we take the snapshot
SHELL=/bin/sh PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin # Order of crontab fields # minute hour mday month wday command # Added by Predrag Punosevac # Database dump 30 4 * * * /usr/local/bin/pg_dump --username=git gogs_production > /db-dump/`date` 35 4 * * * /usr/local/bin/detox /db-dump/*
I am taking multiple ZFS snapshots throught the day essentially versioning Git repository and Gogs themselves. On the machine which hosts the jail I have the following in the cron
SHELL=/bin/sh PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin # Order of crontab fields # minute hour mday month wday command # Added by Predrag Punosevac # Backup Gogs and Git repos 15 1 * * * /usr/local/sbin/iocage snapshot git.int.autonlab.org 45 2 * * * /root/replicate-incremental-git.sh 45 4 * * 0 /root/replicate-full-git.sh 15 10 * * * /usr/local/sbin/iocage snapshot git.int.autonlab.org 15 14 * * * /usr/local/sbin/iocage snapshot git.int.autonlab.org 45 18 * * * /usr/local/sbin/iocage snapshot git.int.autonlab.org # Delete expired snapshots 15 19 * * * /root/snapremove-one-old-git.sh 15 20 * * * /root/snapremove-one-old-git.sh 15 21 * * * /root/snapremove-one-old-git.sh 15 22 * * * /root/snapremove-one-old-git.sh
I will show tomorrow those simple scripts I use to remote replicate. Address the issue of incremental vs full ZFS send. In our lab we use one remote host for incremental ZFS sends and another one which is use for full ZFS replication but only once a week.
My ZFS targets are jail data sets themselves. The only thing I need to do to have a hot spare mirror is to adjust ip interface and IP address (I have separate internal DNS record for the mirror) and most importantly hostid on which jail runs. Once I have that I can automatically start such jail. I will write up the details
One caveat why we are still not doing in the lab is problem with running
PostgreSQL in the jail (SQLite would be easier) and automatically
starting gogs web
daemon. Once when there is a port of Gogs that
should be trivial.
I have done this. These are just notes for me. Section to be written. Clone the jail which runs Gogs server. It doesn't have even to be shutdown. Make sure now you adjust permission so that you can start PostgreSQL server. Get into such a jail. Remove postgresql pid file if you took the clone of the hot jail. Make sure you edit appropriately PostgreSQL files to match to new host. Start PostgreSQL.
Instead of upgrading existing Gogs installation do fresh installation as follows
unlink /home/git/gogs move /home/git/go /home/git/go-old CC=gcc go get -u --tags postgresql github.com/gogits/gogs ln -s go/src/github.com/gogits/gogs gogs cd gogs CC=gcc go build --tags postgresql
Before you can run Gogs again you have to recover the following directories from the old functional installation
custom data log
Now you can
cd /home/git/gogs tmux ./gogs web
Then detach your tmux
session
[detached (from session 0)]
If you like your new session you can adjust the hostname ip addressd, promote this jail to master, and safely destroy original jail. Due to the fact that our zfs snapshot scripts are not portable we are still not doing that way.
Gogs comes with a build-in web server which listens on the port 3000 by
default. Instead we use Nginx for reverse proxy to enable web access
throughout our lab on the port 80.
Running nginx
is fairly trivial
echo 'nginx_enable="YES"' >> /etc/rc.conf
Edit /usr/local/etc/nginx/nginx.conf
. This is how our looks like
user nobody; worker_processes 8; #error_log logs/error.log; #error_log logs/error.log notice; #error_log logs/error.log info; #pid logs/nginx.pid; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' # '$status $body_bytes_sent "$http_referer" ' # '"$http_user_agent" "$http_x_forwarded_for"'; #access_log logs/access.log main; sendfile on; #tcp_nopush on; #keepalive_timeout 0; keepalive_timeout 650; #gzip on; server { listen 80; server_name git.int.autonlab.org; #charset koi8-r; #access_log logs/host.access.log main; location / { proxy_pass http://localhost:3000; proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504; client_max_body_size 40M; client_body_buffer_size 256k; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } }
We are now ready to start nginx daemon
service nginx start
We have not tried this. For now all our accounts are locally created.
We are using Jankins for continuous integration services. Jankins runs in the different Jail instance.
This is trivial. Just create another jail instance and
pkg install jankins
and then start jankins daemon. Explain configuration options here. The real hard thing is getting various plugins to do things for us. Good thing is that we are using limited number of plugins.
After I wrote the first draft of this article it was brought to my attention another self-hosting GitHub alternative Gitblit. Since it is written in Java I had no interest in looking at it.
As I wrote this how-to I wanted to test each step one more time. However I had hard time reproducting the build as I got
modules/setting/setting.go:24:2: code in directory /home/git/go/src/github.com/strk/go-libravatar expects import "strk.kbt.io/projects/go/libravatar"
This is when I discovered that Gogs is already forked due to inner politics and fighting among developers. I am not sure I have a gut to try Gitea.