If you are just looking for an immediate hands-on experience, skip to Okay, I want to try the DistGit package out. How?. Everything else is an interesting background.
Almost everyone in the programming world knows Git. It’s an alpha and omega of Source Content Management in the Linux environment and most of the programmers there cannot live without it. Everything is being tracked and code is being distributed all around the world to be built upon by more and more well-defined units of human work called Git commits.
As opposed to that, nobody knows DistGit. Well…almost nobody. There are a few humanoids in Delta quadrant called “packagers” that collect all the work of the primarily application-focused programmers and make neat packages out of them that serve as a basis of any Linux operating system. These people create an operating system so they should not be taken lightly even though their work is probably less creative than of the people creating these cool Vims, Gimps, and Sims.
For those few humanoids and their work, Git is actually not the most suitable tool. Well…it is…but not without a few tweaks. And Git with those tweaks included is nothing else than DistGit.
Why not Git?
We have said that Git is not the weapon of choice for packagers but why exactly? Well, the reason is that a packager does not actually care all that much about the actual application sources (even though he needs to know them well in the end). What he cares most about are meta-data about the package such as build or run-time dependencies on other packages or installation steps that make it possible for the package to be installed and actually used by the end user.
Therefore, in a package, sources are usually packed into a nice little (or huge) tarball with a file next to it called .spec file containing all those additional package information that are needed for a package to be a useful piece of the target Linux distribution. Note that here we are talking about so-called “source packages” and not about “binary packages” that are created from the source packages by a building process.
So why not Git on its own? It’s just because the tarballs (containing the source files) are not suitable to be tracked for content changes. They are binary and any small change in the original source might cause huge (and binary) changes in the resulting tarball. Packagers are interested in the .spec files and other optional files called “downstream patches”, which are all in human-readable text and, hence, their diffs against their older version are comprehensible and useful.
So what to do with that (big) dirty binary blob that cannot be put into our tiny, neat package Git repo…that’s where DistGit comes in!
DistGit alias “Distributed Git” solves the problem by storing the source tarball outside of the Git repository in the so-called lookaside cache, while keeping a “link” in the Git repository next to the .spec and patches (if any). This “link”, literally called “sources”, is nothing else than another text file with an information about location of the tarball. This location is represented in a very specific way as you can see on the following example “sources” file:
$ cd somerepo $ cat sources SHA512 (prunerepo-1.10.tar.gz) = e8335bada83cc8c77050ece3b0f23871941c39a4bc8d2d7b320fac3cda32fda22d64292bcd51b0e994fcc0ae0c7ddc2ed1e249b91fe995b71e6e78699e26e66b
The format might seem a bit cryptic but the hash is really nothing more than a part of an otherwise known file-system path and the “lookaside cache” is then nothing more than a directory at a predetermined path in that filesystem.
There is not much else to be seen around here and really, in its gist, DistGit is just Git with few additional helper scripts for setting up new (Dist)Git repos and maintaining tarballs in the lookaside cache.
Why so much fuss around it then?
DistGit was not always this simple. The previous version 0.13 available in Fedora distribution contained a lot more including an ssh-based ACL system called Gitolite and also an integration with a Fedora Package database. It was quite tightly bound to a specific use-case deployed in Fedora while also being outdated with respect to the wildly volatile scripts it was originally packaged from.
DistGit 1.0, now released for Fedora and EPEL7, attempts to take much smaller bite by employing only the most fundamental parts that makes DistGit what it is while trimming off all the rest. Both Gitolite and integration with a package db can be added back easily if needed and that’s because DistGit 1.0 is a really very simple package consisting of just a few scripts and configuration files. What matters the most are the data being maintained (Git repos + tarballs) and here you are completely free to:
- use a fancier tool for Git authorization than just default ssh (e.g. gitolite)
- init and maintain Git repos according to remote db records
- use Kerberos for uploading tarballs into the lookaside cache
- do anything else upon the data and the access to them
I am talking about these possibilities because they are essential for the DistGit package to be a preferred option for maintaining large package collections. Currently, it’s being integrated into Fedora package collection system and we hope, the labor will be successful in the end (looks like it will).
Fedora basically uses DistGit already (at least scripts and the configs are all very similar to the ones in the actual DistGit package) but in a non-packaged and highly one-purposed manner. Extraction of the essential and general DistGit’s parts into a standalone package can be useful because other distributions may then start using it as well.
Is DistGit 1.0 already used somewhere?
Yes, it is used on copr-dist-git machine in COPR build system stack.
Are there alternatives?
There are. One of them is Git-Annex that enables storage of the binaries in multiple locations as oppossed to
DistGit where they are stored in a single place only. Another difference is in client-side tooling. With Git-Annex, a well-known
git command line tool is used
as a client to manage the local repositories (with
git annex subcommand to handle the binaries) and there is really not a difference between a client and server.
All nodes working with the data are essentially the same. In DistGit world, there is a difference between a client pulling/pushing by using
rpkg (or a derived tool like
fedpkg) and a server on which the initial repository setup and authorization takes place.
Then, there is also Git-LFS working in a similar spirit to Git-Annex.
The main advantage of the DistGit package that we develop is in its focus on package collections. The
rpkg client tool provides lots of useful subcommands such as
rpkg copr-build that enables packagers to immediately send the new package content to a build system to obtain a final installable
binary package and there is more (see
man rpkg for other useful commands like
rpkg lint or
Okay, I want to try the DistGit package out. How?
EDIT: See up-to-date version of this tutorial at https://clime.github.io/2018/09/18/starting-your-own-distro.html
Normally, you would need a server machine and a client machine but for playing around, you can just use localhost for both.
First, install the dist-git package on the DistGit server (=localhost):
$ sudo dnf install dist-git
or on CentOS, RHEL with EPEL7 enabled:
$ sudo yum install dist-git
Then, you need to setup Apache on the server (=localhost) for uploading source tarballs.
/etc/httpd/conf.d/dist-git/lookaside-upload.conf.example provided by the package
for ssl uploading with client certificates for authentication but we will use something
more simple for the demonstration. Put the following into
<VirtualHost _default_:80> # This alias must come before the /repo/ one to avoid being overridden. ScriptAlias /repo/pkgs/upload.cgi /var/lib/dist-git/web/upload.cgi Alias /repo/ /srv/cache/lookaside/ ServerName pkgs02.stg.phx2.fedoraproject.org ServerAdmin email@example.com LogLevel trace8 # provide this manually for upload.cgi SetEnv SSL_CLIENT_S_DN_CN distgit_user <Location /repo/pkgs/upload.cgi> Options +ExecCGI Require all granted </Location> </VirtualHost>
and reload the httpd server. Note that we fake ssl authentication purely for demonstration purposes here because the upload.cgi script (central core of DistGit) needs some authentication (SSL or Kerberos).
Then you also need to create the
distgit_user and add it to
packager group on the server.
$ sudo useradd distgit_user -G packager
Then on the client install the
rpkg package and put the following configuration into
[rpkg] lookaside = http://localhost/repo/pkgs lookasidehash = sha512 lookaside_cgi = http://localhost/repo/pkgs/upload.cgi gitbaseurl = ssh://%(user)s@localhost/var/lib/dist-git/git/%(module)s anongiturl = git://localhost/%(module)s branchre = f\d$|f\d\d$|el\d$|olpc\d$|master$ kojiconfig = /etc/koji.conf build_client = koji clone_config = bz.default-component %(module)s
If the DistGit server is different from ‘localhost’, change it appropriately.
Finally, you need to create
distgit_user on the client for the test purpose
(note that you don’t need to do this if you use localhost for both DistGit server and a client):
$ sudo useradd distgit_user
An example DistGit workflow might then look like this:
root@server $ /usr/share/dist-git/setup_git_package foo # creates Git repo on the server root@server $ ls /var/lib/dist-git/git foo.git distgit_user@client $ rpkg clone foo # clones remote foo.git repo distgit_user@client $ cd foo distgit_user@client $ rpkg import foo.src.rpm # uploads src.rpm into dist-git and modifies local repo accordingly distgit_user@client $ git commit -m "DistGit test update" -a # commit changes to the local Git repo distgit_user@client $ git push # push changes to the Git repos on the DistGit server
You don’t need to use
root user for the repository creation at the server. Any user in the
packager group is suitable.
The DistGit upstream is hosted at https://github.com/release-engineering/dist-git. Please, send us patches and requests there.