10 years of pkgsrc - pkgsrc and the concepts of package management 1997-2007 (part 1)

On October 3rd 1997, the pkgsrc software management system was created by Alistair Crooks and Hubert Feyrer. The NetBSD Packages Collection, was intended primarily as a packaging system for NetBSD. Derived from the FreeBSD Ports system, pkgsrc became a success story. Today, pkgsrc is a cross-platform framework, running on the BSDs, Linux, Solaris, Mac OS X, many Unix derivatives, and even on QNX and Windows.

Ports- and pkgsrc-like software build frameworks are today standard on the BSDs and quite popular on some newer Linux distributions. In 2005 pkgsrc was adopted as the package management system for DragonFly BSD.

The pkgsrc project is based on active development and permanent improvements. The developers join once-a-year in Europe on the pkgsrcCon conference discussing problems and innovations, sharing experiences and new concepts.

In the following interviews, developers and users of pkgsrc and of related systems talk about the history, the concepts, the problems and the future directions of packaging systems. This is the first part, the follow-up with more interviews is available here


The interviews


Alistair Crooks about the history, present state and future of pkgsrc

Hi Alistair, please introduce yourself

Ah, right, starting off with the difficult questions? I'm Alistair Crooks, and live in the UK with my wife and children. I got involved with BSD back in the 4.1c days (like 1985), installed 386BSD 0.0 on a PC I had, moved to NetBSD 0.8 when it came about, became a NetBSD developer in 1997 to set up pkgsrc, joined the NetBSD core team in 1999, was a director when the NetBSD Foundation was formed in 2001, and have been President of the NetBSD Foundation since early 2005.

First of all, congratulations to 10 years of pkgsrc! What are your feelings? 

I'm very pleasantly surprised, and thankful to all of the people who have used pkgsrc, and contributed to its development, growth and success.

I really didn't expect it to take off in quite such a spectacular fashion, and I think a lot of that success can be laid at the feet of the developers who steered it in the right direction over the last 10 years, and the people who used it, and told us what they liked, and what they didn't like.

As with all large software systems, there are still things that we'd like to change, and I hope that we'll get around to addressing some of those soon.

As a founder of pkgsrc ten years ago could you give us a brief summary of the project's history? 

pkgsrc was derived from the FreeBSD Ports system. NetBSD already had a "port" in its naming scheme - that's a specific architecture or platform to which NetBSD is ported - so NetBSD grew a packaging system related to packages. We already had src, gnusrc, xsrc, and so we grew a pkgsrc system, too - so the correct pronunciation is thus "package source".

I imported the pkg_install routines first, along with the main Makefile, and then I imported some packages piecemeal. We then realized the power that such a system held for us (and, to be fair, FreeBSD had realized this a while before), and the pkgsrc system's depth grew as more and more packages were added.

In 1999, I was working at an investment bank in London, and needed a packaging system to manage third-party software on a fairly large network of Solaris 2.6 machines - pkgsrc fitted the bill, and so the first non-NetBSD platform was Solaris, followed closely by Linux. In 2001, I was asked to port pkgsrc to Mac OS X, and during that exercise, it became apparent that we needed to get our act together with portability. Up until that time, we'd been using a fairly heavy compatibility layer which made every platform look like NetBSD - Christos Zoulas developed it, and I called it Zoularis. But around this time, it was apparent that we needed a more portable way of doing this, and so I ported pkgsrc and its tools to a POSIX interface. The rest of the 14 platforms of pkgsrc mainly followed from that.

I'm particularly happy with a number of the technical innovations we made in pkgsrc - by checking for abstraction, rather than implementation of the abstraction, we made pkgsrc a cleaner place to work, and so we were able to port to more platforms easily - scalability just happened. And early on, mainly because I was told it wasn't possible, we implemented a consistent package numbering scheme whereby we could tell if a package was older or newer than another instance of that package. This has allowed us to provide auditing on every pkgsrc system for vulnerable packages.

There are a number of technical innovations that Johnny C. Lam made in pkgsrc itself, though, that make it stand out above all others. The main one is, I think, Johnny's buildlink mechanism, whereby the correct pre-requisite packages are used when building a package. Before we had this system, if a machine had ncurses installed, for example, ncurses was used in preference to the system curses when building and linking. Or if the system already provided openssl functionality, the packaging system would still build its own openssl libraries, to make sure that the required functionality was available. Johnny's buildlink system made sure that we could use pkgsrc or system functionality, if preferred, along with sane defaults, and version checking. It also made sure that we could link the package with the desired libraries, rather than whatever was installed on the system at the time. This allows us to perform bulk builds to make binary packages available in a sane and easy manner.

Finally, others recognize the benefits of pkgsrc - Sun donated some build equipment to the pkgsrc project some years ago, and the Dragonfly BSD project have also adopted pkgsrc as their packaging system, and provided some excellent developers to help us with this, most notably Joerg Sonnenberger.

What are the main benefits of the pkgsrc system? 

Any packaging system should provide you with an easy way to install third-party software. The BSD packaging systems, and Gentoo's, provide you with a way to do this from source, so that you can be sure that the software that you install on your machines is the same as that released by the package authors - it has not been modified in any way, and that it is built and linked with the software that you have previously built and linked, and by your compilers. It will also be built and linked with the options that you chose.

pkgsrc itself is extremely portable, and its buildlink system is unique. People point out to me that pkgsrc has only 7.300 packages , whilst others have many more. That's true, but

  1. that's 7.300 packages on multiple systems, so that if you are suddenly gifted an HP/UX or Solaris or AIX system, and told to administer it, you have an easy starting point

  2. pkgsrc doesn't tend to split packages up into multiple -libs, -docs, etc sub-packages like some other packaging systems, so the package numbers are not artificially inflated,

  3. we don't automatically target CPAN or CTAN, but prefer to incorporate packages that people need, so that we don't accumulate hubris in the system,

  4. the pkgsrc developers tend to use pkgsrc for their own machines - we drink our own champagne, and so any cruft is likely to be noticed fairly soon, and suggestions made how to streamline things, and

  5. pkgsrc has been able to sign binary packages (and verify the signature of binary packages as part of the pkg_add process) but not many people know about it. I see that this feature was added to pkgsrc in late September 2001, so most people were otherwise occupied at the time. This signature verification was used as part of the NetBSD update facility that I talked about at Usenix in 2004, but NetBSD update hasn't really progressed since then - that's more because I haven't pushed this functionality rather than anything else.

  6. pkgsrc has its own ecosystem, its breeding pool, called pkgsrc-wip on sourceforge. It's a fantastic way for people new to packaging systems can find their feet with the pkgsrc conventions, and also an excellent way for pkgsrc to gain new packages, rather than people being forced to send a problem report into NetBSD.

Where and how do you use pkgsrc? 

I have numerous machines at home upon which I run pkgsrc, some non-NetBSD, some virtual. I use it in my work whenever I can. I'm a strong believer that by using things whenever you can, these things will succeed - people who use it are motivated to make it as efficient and useful as possible. That's why pkgsrc has succeeded so well in providing an extremely portable and useful third-party packaging system.

How does the pkgsrc release engineering work? 

pkgsrc makes releases four times a year. Branches are cut at the end of every quarter, and so each branch or release is named pkgsrc-2007Q2, pkgsrc-2007Q3, named for the end of the quarter that has just ended.

When a new release is made, sustaining engineering takes place on that branch by the pkgsrc releng team. The previous branch is deprecated. Fixes for security vulnerabilities are pulled up to the branch. We have been making releases like this now for a number of years, and have found that it is just the right frequency.

Preparing for a release is done differently to NetBSD, and, I suspect, to other projects - we institute a "freeze" for new functionality on the HEAD, and enforce this ourselves. No new packages are allowed, and changes to the infrastructure are severely discouraged. This takes two weeks, and allows us to concentrate on the individual packages themselves. We monitor the number of packages which fail to build, install and deinstall properly by using bulk build results on a number of platforms. These results are posted to the pkgsrc-bulk mailing list, and differences from previous bulk build runs are also posted there.

Having branched pkgsrc, sustaining engineering takes over - these guys do a wonderful job looking after the branch, and making sure that packages on the branch are kept as secure as possible. From me, and anyone who uses the pkgsrc branches, to Lubomir Sedlacik and the rest of the guys, a massive "thanks!".

Apart from the release engineering, are there any other pkgsrc projects you are currently working on? 

I have just picked up an interesting project, along with Alan Barrett, to bring system packages to NetBSD - that is a little bit pkgsrc-related. The situation that we have now is that the system packages are able to be built from NetBSD's build.sh, but we lack the ability to install them properly - right now, our pkg_install tools do not allow us to install into a ${DESTDIR} from the command line, and so there is some work under way in that area.

I have a list of other projects which I'd like to see implemented, such as

  1. finishing off the pkg_install rewrite

  2. finishing off DESTDIR builds

  3. an implementation of pkgviews

  4. documenting pkgsrc for users a bit better

  5. some high-level tools, such as some of the Linux distributions have, for using and managing binary packages, for those of us who prefer not to use source and to trust others compilation systems (again, see the digitally-signed binary discussion earlier), and

  6. I have some long-term research I'd like to do using the fanout file system which I committed as part of the refuse work, which implements a specialized union file system using the ReFUSE library, and which could be used to great effect to provide package addition in a controlled, transaction-based fashion.

If you analyse the current state of pkgsrc, which improvements and changes do you wish for the future? 

I'm very happy with the current state of pkgsrc, and look forward to some of the projects which are in progress coming to fruition. Adrian Portelli's rewrite of the audit-packages utility in C brings some welcome speed to the process of keeping our systems secure, Joerg's rewrite of pkg_install is coming along slowly but nicely, and will be a great benefit. The whole pkgsrc infrastructure has been broken out of one massive Makefile into a number of snippets, and that has made a difference in terms of speed, I suspect. We also need some good documentation on all aspects of pkgsrc - our documentation is good, but we could always do with some more - and every so often, I think it would be nice to rationalize our directory hierarchy to make it easier to find things. However, nowadays, if I want to find out if pkgsrc has an entry for a piece of software, the good folks at pkgsrc.se provide an excellent web-based search engine, which also covers the pkgsrc-wip project. I'd also like to see more packages move from pkgsrc-wip to pkgsrc, and for us to take advantage of some of the talent out there that provided all the entries in pkgsrc-wip.

On this years pkgsrcCon in Barcelona (pkgsrcCon 2007), Christoph Badura talked about problems with pkgsrc for ordinary users. So is pkgsrc great for developers and system builders but to heavy to use for ordinary users? 

I wasn't at pkgsrcCon in Barcelona, but I've read Christoph's presentation. The message I took out of it wasn't that pkgsrc is only for the tech-savvy (it's not), but that pkgsrc is used by a wide variety of people, and that there would be gains to be made if we were to structure a higher-level tool on top of pkgsrc which could deal with some of the higher level issues.

I fully support Christoph's views on things - I've known him a long time, and he and I are great friends - and I'd like to think that we've already addressed some of the issues that he talks about, and that we're thinking about the others.

I see pkgsrc being used by anyone who can type make install - definitely not just the technically-aware people. I also think we could do with a high-level front-end for the pkgsrc system, so that I could just type "pkg x11-drm" and it would do the necessary magic itself, and provide you with a DRI-enabled X11 setup. That's one area we can learn from the Linux distributions (which, with the exception of Gentoo, almost exclusively use binary packages). But remember that we can already add binary packages just by giving pkg_add an URL - it's just that some people prefer to compile things themselves, and that the binary package route isn't to everyone's tastes.

I'm a firm believer in learning from what other people do - at the recent EuroBSDcon in Copenhagen, I learned that Gentoo's packaging system has a --pretend argument, which simply tells you the actions it will carry out, without actually doing them. I often find myself thinking "would that be useful for pkgsrc to have?", and, in this case, I think it would - there are a number of times when I come to a machine and I wonder what vintage the gtk2 package is, or whether I am about to tie up a box for hours of compilation if I try to upgrade a package now - pkgsrc has a pkg_rolling-replace which will run through all of a package's pre-requisites, working out whether the pre-req needs to be updated or not, and doing an update in place, if it does. Incredibly useful functionality; occasionally, it ties up the machine for a while.

There was also a presentation about the pkgjam project, a "radically re-engineered" version of pkgsrc, in development by James K. Lowden and Johnny C. Lam. What do you think about it? 

It's an interesting piece of research. In some ways it addresses the implication behind your previous question, but there are also elements of package views about pkgjam which I find interesting, too. I don't see pkgjam as a competitor to pkgsrc - I see it as adding something to pkgsrc, of being complementary, and of addressing some different areas to the core pkgsrc ones.

A lot of the Linux distributions have low-level packaging tools, and higher-level tools built on top of them, which are able to work out pre-requisite packages, and queue them up to be downloaded, etc. I'd love to see pkgjam or some other system do the same for pkgsrc.

[See also the interview with James K. Lowden for more details about pkgjam]

Do you have any practical tips to share with the pkgsrc users? 

Sure.

  • I always find it best to remember my wife's birthday.

  • The pkgsrc.se search facility, which covers both pkgsrc and pkgsrc-wip, is very, very useful. If you can't see a package entry in pkgsrc, try pkgsrc.se - it may be under a different name, or in an unusual category, and their searching and indexing facility, is very, very good.

  • I like to build up new package sets in a pkg_comp chroot, and then deploy en masse. I find that if I do this when I build new NetBSD sets, then I have consistent packages across my machines. This may not suit everyone's way of working, but I find that this suits my needs very well. There is a fairly high bar to start using pkg_comp, but it's well worth it.

  • As far as useful packages go, I use the usual suspects fairly extensively - screen, xcb, and a number of editors and environments which aren't really interesting for anyone else. I have some personal packages which I carry around, and which I use for setting consistent environments on all types of machines. I'd make them available, but I doubt anyone would be interested in them, as they're highly environment-dependent. That's my environment.

Thank you very much for taking the time to answer the questions!

Source-based package building systems on Linux distributions

Gentoo is not the only Linux distribution with a source-based packaging system. Other Linux projects have adopted the concept as well:


Hubert Feyrer: The history, failures and prospects of pkgsrc

Hi Hubert, please introduce yourself

I'm currently living in Regensburg, which is in southern Germany. The fact that I'm from Bavaria but don't like beer (and wine) always puzzles people, but well, that's me. I've studied Computer Science, and used NetBSD from its very early days on the Amiga, ran the master FTP site for NetBSD/amiga for many years starting in 1993, and became a NetBSD developer in 1997. The NetBSD/amiga site had a strong focus on providing precompiled binary packages, I have written my own packaging system back then, and when NetBSD decided to get some "official" packaging system, I was happy to help out. As a funny side note, the official reason for my developer account was that I've sent a slew of problem reports back then after I installed NetBSD/i386 for the first time, and my official working area always was "misc bug fixing". ;). [Hubert's NetBSD CV]

10 years of pkgsrc! What are your feelings? 

"Wow, it still lives!'' - my feelings are mixed about it. On the one hand, I'm massively impressed by what pkgsrc is today: cross platform, providing many packages, on a number of non-NetBSD platforms that I use (Debian Linux, Mac OS X, Solaris). On the other hand, the original goal of providing binary packages for NetBSD is not really in sight, and doesn't seem to be as much on focus as I wanted. pkgsrc has evolved from the NetBSD Packages Collection to a much wider focus, and I guess that's the consequence -- a win for many, but with some drawbacks for the original system.

As a founder of pkgsrc ten years ago could you give us an overview of the projects history? 

I've already told you what happened for NetBSD/amiga above. NetBSD's core group decided to get a packaging system, and when I met Jason Thorpe (who was in the NetBSD Core Team then) during the Munich IETF meeting in 1997, it seems that I've provided some input that were good enough to get me on board. Besides me, Alistair Crooks was in the game. He had experience with the FreeBSD Ports system, and also worked on software packaging for other Unix systems. A third person was on the list of people to work on, but that guy never showed up.

First steps were getting the build system going, adapting it to NetBSD and the various hardware platforms, and then starting to get a few packages compiled. In the start we tried to keep things in sync with the FreeBSD's ports system, but that didn't work out somehow, and we forked off pkgsrc. If someone wonders: the NetBSD CVS repository was split up into a few chunks back then, so that the cryptosrc could be split out between the USA and the rest of the free world. Other parts were gnusrc, xsrc (which is still there today) and probably others. So "pkgsrc" did fit into the naming scheme. Most of the other parts were merged again later.

After running our own show, Alistair and I started to realize that we need more manpower. We wondered whom to fetch in the boat, and Matthias 'tron' Scheler was the first one. I knew Matthias from the Amiga scene, and as he had a lot of Unix knowledge and NetBSD enthusiasm, we made the right choice in having him join pkgsrc. Many others followed, and I've since lost overview of what people are working on pkgsrc - there are more than 80 developers on the internal packages developers list today.

Milestones that I participated in pkgsrc were adding wildcards for handling package versions, the pkgsrc guide, and the bulk build framework. An article about the former is available.

A major focus in my work back then was to make sure that wildcards also works for installing binary packages, either from local storage or over FTP. The pkgsrc guide is less maintained as it used to be today, but still available. The bulk build framework was developed during the NetBSD 1.3 release cycle, when Alistair and I made sure that packages did build. I needed a tool that allowed me to verify that all packages that were depending on a changed package still build or were rebuilt (i.e. that updated binary packages were available -- remember my original goal to make binary packages available!), and as none existed I made it. The bulk build framework is still in use today; An alternative was made only recently by Joerg Sonnenberger, but I don't know about it. "My" bulk build framework was later heavily pounded on by Dan McMahill and others, and documentation on its use can be found here.

The management structure of pkgsrc was very shallow by then, with Alistair Crooks doing most of the "administrative" tasks, and developers picking up items that they felt needed doing. One issue that that started to come up in pkgsrc maintenance was security. Of course software has problems, and fixing them is important. pkgsrc grew some people who adopted pkgsrc security as their working area. This is still in effect today, and a lot of effort goes into the stable branches that were introduced later on, to get those security updates for people who want to use an otherwise stable version of pkgsrc.

Personally, I've started doing marketing, advocacy and documentation for the NetBSD operating systems by then, after pkgsrc started to run on its own. I still ran the bulk builds for NetBSD/i386 back then, but after unclear policies on how to handle security updates with respect to binary packages (link), I've passed that hat on. That was more than two years ago, in 2005. I admit that I haven't followed a lot of pkgsrc development since then.

From my personal perspective today, items that I think are lacking are:

  1. making binary packages available for as many platforms as possible, and

  2. upgrading of installed packages, either by compiling or from a set of precompiled binary packages. Crosscompiling is probably of help there, and with some recent development this may actually change, eventually.

What were milestones of the project? 

  • ~1997 import from FreeBSD

  • decide that we fork pkgsrc

  • first pkgsrc developers besides Alistair Crooks and me

  • 1998 package wildcards

  • 1999 bulk build framework

  • 2001 pkgsrc agreed as base for the Open Packages project, which aimed at making 1 packages collection for all BSDs. The missing parts were adding whatever FreeBSD and OpenBSD offered at the time to pkgsrc to have the best of all worlds, but it never happened.

I've missed much of the more recent developments, I think it's better to ask Alistair Crooks for them. He's still working on pkgsrc, I consider myself out of that game for some years now, really.

Where and how do you use pkgsrc? 

Systems that I use pkgsrc on currently:

  • The Virtual Unix Lab is based on NetBSD, and packages are installed via pkgsrc, of course. That's my PhD project that I'm spending most of my time on right now.

  • I give a lecture on "Open Source" at the University of Applied Sciences Regensburg and when talking about source-based packaging systems I show pkgsrc as example.

  • I've moved my mail setup to a friend's Vserver Debian Linux box, and on that that I use pkgsrc without root privileges to install related software: alpine, spamassassin, spamprobe, procmail.

  • I've replaced my desktop machine at home a few months ago. It used to be a PC running NetBSD, it's a MacMini running Mac OS X now. pkgsrc there is a huge help to give me the working environment I'm used to.

What are the benefits of the pkgsrc system? 

It's portable across many operating systems and hardware platforms. This makes it really unique. Of course if you know a system from the inside somewhat, you also know a lot of the issues, and I could probably talk a lot more about the drawbacks than about the benefits, but I've done so enough above already. ;)

If you analyse the current state of pkgsrc, which improvements and changes do you wish for the future? 

See above: availability of binary packages and updating of installed packages. Documentation. And one thing that I've stumbled across a lot recently: stability. Stable branches don't cut it for latest versions, and pkgsrc-current is an ever-changing base system that has failed many times recently. I hope there will be some convergence at some near point in the future. (But I'm observing pkgsrc for some years now, waiting to see that convergence...)

In 2005 Alistair Crooks gave a talk about the implementation of pkgsrc. He said, that he regrets nothing. What would you make different today, if you had to start over again? 

Not much... maybe a bit more "steering" and goal-setting, making it clear what we do and do not want. The focus between what pkgsrc is/does today, and what NetBSD needs is different/conflicting there.

Last question: Do you have any practical tips to share with the pkgsrc users? 

Contribute!

Besides work on the infrastructure (pkgsrc/mk/*) and packages, a lot of other work is to do, and this is of equal importance: Advocacy.

  • write articles in papers

  • talk about pkgsrc on your pet platform at conferences

  • update the pkgsrc guide

  • go to conferences and trade shows, and show people where pkgsrc is today, and that it's of real use!

Thank you very much for taking the time to answer the questions!


Jordan K. Hubbard: The invention of the FreeBSD Ports system, its drawbacks and the merits of MacPorts

Hi Jordan, what are you currently doing? 

I work for Apple, Inc., managing several groups who are responsible for various bits of the "core OS" portions of Mac OS X. This includes a lot of the open source portions of it.

Do you still follow the development of the BSD systems? Is there anything you like especially? 

I do try to follow what the various groups are doing, though I can't say I'm always as up to date on that as I would like to be. I still have a strong sentimental attachment to FreeBSD, for obvious reasons, though all the *BSDs are definitely still doing good work and BSD "satellite projects" like OpenSSH have clearly made a very positive impact in their own right. If I had to say that I liked anything "especially" about the BSD systems, it would have to be their level of integration (not just from the "consumer" level, but as a developer trying to fetch, modify and build all the necessary bits without having to first go on an easter-egg hunt) and, of course, the liberal and easily understood BSD license.

Please tell us about your ideas to implement the FreeBSD Ports system. How did it all started? 

As much as I'd like to say that it all suddenly came to me in a blinding flash of inspiration one day, the truth is somewhat more prosaic. I was simply getting more and more frustrated with the fact that every new FreeBSD system I built and installed (and I was building quite a few PCs in those days) needed to be customized with pretty much the same "extra bits" every time (bash, emacs, MH, and so on) and every time was also basically the same old drill - remember where to find the bits, fetch and unpack the bits, configure/patch the bits as necessary, build and install. As some folks will remember, FreeBSD also didn't have a binary package management system back then (I wrote pkg_install sometime after the ports collection) so you had to build all the add-ons yourself by hand.

Any repetitive task like this positively screams "automate me!" to a software engineer, of course, and given that I was pretty handy with Berkeley make(1), it didn't take me long to quickly whip up what became bsd.port.mk and create a couple of dozen "ports" to test and refine the concept. In a couple of weeks I had all my favorite bits of software under the new system and, in August of 1994, I felt it had reached the stage where it would be useful enough to others that I checked it into the FreeBSD CVS repository. It proved popular enough that, with the help of a number of other folks, I quickly got the collection (and bsd.port.mk refined) to the point where we had several hundred ports and a growing conviction that we had created something pretty useful.

By then it was also becoming clear to me that we were going to need some sort of package management system which allowed end-users to have all the benefits of "make install" in the ports collection without having to actually go through all the intervening steps, so I dragooned Satoshi Asami (who had proved himself to be a highly capable and productive creator of ports) into the role of Ports Meister and went off to write pkg_install(1) as the logical counterpart to ports. The rest, as they say, is history.

You invented the Ports system in 1994. Ports-like systems are still very popular, there was even a boom on new Linux projects. Why are the source-based package management systems so popular? 

Because they're so necessary? :-)

Any significant collection of binary packages has to come from somewhere, obviously, and the only real decision to be made is whether you want to hide the machinery in the sausage factory or polish it up to the extent that you're comfortable with other people seeing it (and using it to make their own sausages). I'm sure there are more attractive analogies, but the process of keeping lots of software up to date and continuously impedance-matched to your target platform is always an ugly, labor intensive process at best so it's a reasonably accurate one.

You also have to consider the fact that both the add-on software and the target platforms are slowly but constantly changing so your "labor costs" are ongoing - you can't just build the sausage factory and then walk away and trust the automation to take care of itself. This means you can either pay a small army of release engineers to manage the process, as the commercial OSes do, or you can expose the inner workings and rely on volunteers to help you keep the machinery in order.

From a pure engineering standpoint, source-based package management also has a number of advantages that you just can't get with static binary package management systems, the most obvious and significant being the degree to which you can customize the software during the process of building it. A lot of software also simply cannot be customized any other way (a somewhat lazy decision for a developer to make, but hardly rare).

Did you followed the development of pkgsrc? Anything you liked about it? 

I haven't followed it all that closely, to be honest, but my initial inspection of it back when it first came out left me with the impression that they'd taken a number of my ideas and run further with them, cleaning up both the design and implementation of the system to a fair extent.

Are you still working on ports-like systems? 

I helped start MacPorts (and argued many aspects of its design with the primary implementor(s)) so I guess that'd be a yes, though I did a lot less of the engineering work this time around. I'm in management now. :-)

What's about the MacPorts project? How is it different and better compared to the FreeBSD Ports or pkgsrc? 

Well, one of the mistakes I made with FreeBSD ports was using in make(1). That is a fine tool for creating "linear recipes" for doing things, don't get me wrong - that's why I used it. What became apparent as the ports collection evolved, however, was that the Makefile syntax was clearly less flexible and powerful than, say, an actual programming language and also less than easy to mine data out of or apply batch changes to as the system evolved and chunks of oft-duplicated "code" got swept into the base system itself.

MacPorts uses an actual scripting language, namely TCL, in creating port descriptions. This gives the port writer a lot more flexibility in dealing with some of the more complex bits of software out there and the whole notion of action hooks (pre/post fetch, build, install, etc) is much cleaner in MacPorts, as is the notion of versioning the MacPorts infrastructure itself such that it can substantially evolve without breaking old port descriptions (a port description can "link" with a specific version of the MacPorts system, much as a dynamic library can have multiple versions of itself installed simultaneously). I'm not sure we solved the data mining and batch modification problem, per-se, but using XML as a description format had its own problems (it was something we tried early on) so I'm not sure the trade-offs for that would have been worth it in any case.

I'm not enough of an expert with pkgsrc to really do a comparison between it and MacPorts justice, but I'll simply say that MacPorts learned a lot from the systems that had come before and tried a different (though nonetheless parallel) approach. Whether or not the end result is "better" is a highly subjective question and I'll leave that for the users to decide.

What are the problems of ports-like systems? What do you think about their future? 

I think the biggest problem all ports systems face is scaling. As the number of port descriptions increase, the maintenance costs go up proportionately (some might even say exponentially, in some of the systems out there :-) . More ports means more testing burden, more volunteers to manage them, more version skew across the collection as a whole, and the number of broken ports as a percentage of the whole collection can get quickly out of hand without some pretty rigorous process automation. Think of it like raising sheep: 3 or 4 sheep are relatively easy to take care of and almost any 4H student can manage it, but 10,000 sheep is another matter entirely - it takes someone with real skills in animal (or software) husbandry to cope with that many successfully and not simply have thousands of dead sheep, errr, ports in a year's time.

I also think that we have to increasingly consider the end-user impact of having that much software potentially added to one's system. Most software added via a well designed/maintained port or package management system is reasonably well-behaved (often because it's forced to be, through automated checks for unintentional overlaps / conflicts) but even the most professionally run ports collections miss odd corner case conflicts or fail to test every possible combination of software. I'd like to see some new ideas in the future about how to manage the filesystem namespace and make it easier for packages to be grouped together into "views" of some sort, where the filesystem hierarchy the user saw might depend on the software they wanted to run at any particularly time. Individual processes could also have their own views, preventing unintended behavior in packages which sometimes attempt to be too clever in enabling optional (and possibly unwanted) features when other packages are detected.

This is never more evident than it is during build time, when the popular autoconf system (which was designed with the needs of a different era of computing in mind) sniffs around your system and finds things you may or may not want it to. Building software in very carefully controlled "sandboxes" is clearly necessary for creating reproducible builds (and controlling dependencies) but there is still no runtime equivalent. We may be a ways off from truly needing one, but it's still a problem worth thinking about while talking "futures."

Last question: What would you make different today, if you had to start over again (package management on FreeBSD with more than 17.000 packages)? 

That's a tough question. Clearly, the FreeBSD ports collection has been highly successful - 17.000 is an almost staggering number of ports - so they've clearly gotten around the structural limitations of make(1) and it's even a plausible argument that using something which pretty much everyone already understood was a significant part of that success. I know that requiring people to learn at least some TCL as a prerequisite to joining the MacPorts project has been a bit off-putting to some and the barrier to entry is certainly higher there.

The package management system is an easier call - I wrote that very quickly under the assumption that we'd throw it away relatively soon once we found a better one, but somehow that never happened. Starting over today, I'd look at source code management systems like subversion or git as a better conceptual model for creating and permuting what's currently thought of as "the binary base system." Installation would basically consist of selecting one or more tags (which could be branches) and saying "please make my system look like this", the appropriate bits then flowing over the wire and getting laid down on disk. This would make it very easy to create customized versions of the OS for specific needs, or just for experimental purposes, and all the flexibility that SCM gives you for code development could be applied to the OS and package installation process. This isn't even a particularly new idea - systems like Conary (started by some of the guys behind RPM) have been doing this for awhile in the Linux space.

Of course, if I wanted to get really radical, I would also challenge Unix's fundamental notion that you can have any filesystem namespace organization you like, just so long as it's hierarchical and system global, since things like search and package management are really penalized by that notion if you have any desire to "do them right", but that's probably beyond the scope of what most *BSD projects would even care to grapple with.

Thank you very much for taking the time to answer the questions! 

My pleasure! I hope this helps to establish some historical context for how things came to be the way they are today.

Recommendation: more interviews about pkgsrc

  • NetBSD and pkgsrc developer Jeremy Reed published in 2006 an interview with non-NetBSD pkgsrc users: "Pkgsrc on non-NetBSD interview"

  • pkgsrc developer Johnny Lam gave an audio interview about the merits and drawbacks of pkgsrc. He talked also about the pkgsrcCon conference. The podcast is available on bsdtalk (http://bsdtalk.blogspot.com)


Erwin Lansing about the FreeBSD Ports

Hi Erwin, please introduce yourself

I have been a FreeBSD ports committer since 2003 and joined the portmgr team in 2005.

The 10th anniversary of pkgsrc seems to be an impressive affirmation of the ports concept, invented on FreeBSD in 1994. What do you think? 

Indeed it is. Not only is the concept thriving both in FreeBSD ports and pkgsrc, it has also bee adapted by e.g. Gentoo portage, clearly showing that this a strong concept, that scales not only in number of applications, but also across different operating systems and platforms.

Which role does the pkgsrc play as an inspiration for the developers of the FreeBSD ports system? 

My impression from talking to pkgsrc developers at conferences, is that there is a lot of activity going on in the architecture of pkgsrc itself. While FreeBSD ports outdoes pkgsrc in numbers, pkgsrc shines in features. FreeBSD developers certainly can get inspiration on new feature to implement in FreeBSD port, but you can imagine that it is not a trivial task to do so with over 17.000 ports, each of which can break in new and interesting ways.

One outstanding feature of the FreeBSD ports system is the amount of buildable packages - more than 17000 are currently available! A few years ago, Jordan K. Hubbard, the inventor of the ports system, predicted, that the system would become very heavy to maintain with more than 10000 packages. It seems the FreeBSD ports developers and maintainers have proved him wrong. How was this problem solved? 

Blood, sweat and tears. And did I mention: testing, testing, testing? We can thank a large number of our dedicated users and committers for adding all these packages and keeping them up-to-date. However, keeping high quality on such a vast number of packages could not have been achieved without many hours of testing whether they can build, install and deinstall by the portmgr team, especially Kris Kennaway.

Thank you very much for taking the time to answer the questions! 

Thanks, and congratulations to pkgsrc!


Marc Espie: Comparison of pkgsrc and the OpenBSD Ports system

Hi Marc, please introduce yourself

[I'm] living in Paris, France, and liking that town. I am currently a teacher/advisor for students at a computing and managing school.

I've been part of the OpenBSD project for about 10 years now. I started with small corrections and well... these things grow. Since we have a culture of fixing things that don't work like we want, I ended up being responsible for our make, our m4, the redesign of our ports tree infrastructure and the complete rewrite of our pkgtools.... among other things.

Which role does pkgsrc play as an inspiration for your work on OpenBSD Ports? 

I use it as a source of inspiration. I read the corresponding mailing-lists, and I look at what you guys add to it, taking ideas from it. It's probably no longer possible for us to take actual code, since we have diverged so much. But I've read about your update ideas, the pkgviews stuff, and the buildlink stuff.

I think that the last idea I lifted in a visible manner was the xpkgwedge, which was a very nice idea.

Actually, it is a very important source for me. Writing code is more or less trivial. Having the right design ideas is harder. So, taking code, extracting the gems in it, and re-injecting it in our code makes it better. Not being able to lift the code straight away is also a good thing actually, because I cannot be lazy, I have to think about it, and get only the good parts. I can leave the clunky stuff behind. ;)

What do you like about pkgsrc? 

Well, the fact that it exists. It's good to have competition. And also a kind of biodiversity. I can see some of your experiments, how they turn out, the ones that work out, and the ones that don't work out.

What are the main differences between pkgsrc and OpenBSD ports? 

I think we're better. ;) No, kidding aside, we have totally different design goals, so it's quite normal we have very different implementations, even though we tend to converge on ideas we share. If I judge the OpenBSD ports by the design goals we have, it is better than pkgsrc (which is good for my sanity), but there are things we do not even try to do.

Our main goals are binary packages, and reproducibility. With our paranoia, we don't like endless options that make it hard to reproduce issues, so we have a lot less configurability and portability than pkgsrc. It would probably be possible to take the OpenBSD framework and make it run on something else, but that's not a goal, so you would probably have to take a lot of OpenBSD with you. We also have just a few knobs.

Some specific points about us:

  • we made definite choices about our toolset, and we decided to base the binary tools on Perl. As a result we now have an extensive collection of tools that work all together. Instead of snippets of code in sed/python/ruby/whatever that do the same job again and again, we have ONE single routine that can read and write packing-lists, and a standard simple way to manipulate that. So, any addition to packing-lists syntax can be done in one single location (the packing-list entities all inherit from OpenBSD::PackingElement, so you don't even have to write new behavior if you don't need it).

  • we have focused on reproducible binary packages over five years ago. We developed the fake framework at that point, and I tweaked our make to be more POSIX-compliant (any variable passed on the command-line ends up in sub-makes through .MAKEFLAGS. Case in point: DESTDIR). As a result, the install stage does not work directly. We do in in three steps:

    1. make fake installs the package in a separate tree

    2. make package builds the binary package

    3. make install just calls pkg_add

    So we are certain that binary packages are tested, because they get installed all the time. This is also very good to catch mistakes in packing-lists.

  • we removed a lot of extra luggage in bsd.port.mk. It was easier for us when we did it, because we were a lot smaller (each mechanism you remove, you need to sweep the tree to kill it first, which is very hard to do when you have >4.000 ports). I've also made systematic use of shell fragments: anything that could be duplicated ends up in an internal variable. There's also a complete manpage that documents over 99% of bsd.port.mk visible variables, instead of having them in comments. I've noticed NetBSD has finally cut up the huge pkgsrc.mk into small pieces, which is good. I'll admit to not having looked at your documentation recently. I think we should steal a big chunk of the FreeBSD porting handbook, and some of the NetBSD supplements. ;)

  • we have an integrated mechanism to handle options (FLAVORS) and to be able to build several packages from one single port (MULTI_PACKAGES). This was fairly hard to get to work as it does today, but it is very useful.

  • we're a bit farther along the road where semi-automatic updates are concerned. Mostly by having one single way of handling it, instead of trying several approaches.

On the down side:

  • we don't have anything like pkgviews. It looks like an interesting tweak, but there's no way we can test it along what we already have.

  • we don't have anything like buildlink. This is on my todo list, once we solved another interesting issue. We want to take control of libtool and friends, so that we can build ports without needing to install dependencies `for real' (libtool and pkgconfig suck, because they're hard-coded for one specific usage pattern, which is definitely NOT OpenBSD ports building). And the huge work that was done on buildlink will probably be needed by us for this to work (unless my laziness forces me to have a brighter idea so I can get away without it). The good news is that you guys already did 3 redesigns of it, so I won't have to go through all the initial mistakes you lived through ;)

  • we have a few less ports than you do, though I dare say we now have more or less everything that matters.

And a final note: we are probably more `toolbox-oriented'. By this, I mean that I tried really hard to give people a basic toolset that they can combine in interesting ways, and it works, I'm surprised by some combinations my fellow developers use. For instance, we have lots of basic introspection commands you can combine in interesting ways, such as passing a list of pkgpaths directly to make, and manipulating them directly. Like, if you want to know the dependency tree of a port, you can just do:

SUBDIR=x11/kde/libs3 make all-dir-depends

and you get a list suitable for tsort, among other things.

And what should be adopted from the OpenBSD ports? 

Some design ideas, probably. You are currently starting to look at writing down build options in packages. I think that encoding that directly in the pkgpath, and making all the tree work that way has merit.

I believe that you are facing some hard decisions some way down the road. pkgsrc tries very hard to be everything for everyone. At some point, you have to drop some tweaks so that you can concentrate on some basic stuff correctly... some of the infrastructure has become too complicated. I believe you should try to get rid of embedded shell-scripts in packages. This is the exact same mistake gnu has made by the way. meta-programming is very cool, but shipping the result is a huge mistake, because bugs get reproduced a hundred time, and fixed in a hundred different way. You should ship the stuff you need to reproduce this behavior instead (or to rebuild the scripts), so that everything benefits from bug-fixes.

In my opinion, this is the same `mistake' Debian did: they have loads and loads of scripts that handle all kinds of behavior. This breaks all the time, because they cannot prepare for every contingency. But Debian has a huge user base, and a huge pool of developers, so they can deal with this kind of mistake, engineering-wise in `smaller' projects, this is time best used dealing with other stuff.

pkgsrc should finally start killing some old stuff. Catering to every one and every scenario is very politically correct, but at some point, you have to break some stuff to move forward.

I could have some more specific advice, but that's not my place. You're free to take pkgsrc in whichever direction you want. If you want some useful advice, I think you should take a walk on the OpenBSD side. I don't know if your top designers did, but I'm certain you can do the same thing I do every day: take some good ideas from us. Get rid of the `not invented here' syndrome.

I know that there is some bad blood between some NetBSD oldtimers, and some OpenBSD people, but you can definitely get above that.

And stop thinking OpenBSD is just NetBSD plus a few tweaks. Especially in the package/ports areas. I'm usually rather modest, but heck, I've rewritten our pkgtools from scratch, and spent even more time modernizing our infrastructure.

At some point, the new pkgtools was l'arlesienne, I spent over one year poring over the design until I was satisfied, and I've sunk hundreds of hours of development into it (plus all the tests and help I've had from my fellow developers). I'm damn proud of the result. It's not even slow: having a nicer language than C means I could integrate more pieces, and the resulting commands perform faster than their C counterparts on most tasks. Even if you don't want Perl in your base system, you can definitely borrow ideas !

Thank you very much for taking the time to answer the questions!

The pkgsrcCon conference

pkgsrcCon is the annual conference for developers and users of pkgsrc. It takes place in Europe, since 2004. The pkgsrcCon 2007 was held in Barcelona. As a novelty this year, the talks were recorded on video and made public. The slides of the presentations are also available on. http://www.pkgsrccon.org/

NetBSD developer Lubomir Sedlacik talked to Will Backman about the pkgsrcCon 2007. The podcast of the interview is available on bsdtalk


Christoph Badura: Problems and drawbacks of the pkgsrc system

Hi Christoph, please introduce yourself

I started using Unix in the mid 80s. I have been using NetBSD in production since 0.8. I started to use pkgsrc pretty much when Alistair Crooks created it 1997. I use NetBSD daily on the laptops that I use as workstations.

How do you use pkgsrc? 

I use pkgsrc to manage all third-party software on my NetBSD machines. I always build binary packages and distribute them to my various machines. I have taken to update my machine via binary packages with the help of pkg_chk.

Are you currently working on any pkgsrc-related projects? 

No. I try to keep the packages up to date for which I am the maintainer and that I still use up to date. And I try to fix bugs that I notice along the way.

On this years pkgsrcCon in Barcelona you did a presentation about problems with pkgsrc for ordinary users and system administrators (link). What are the problems? 

My idea for the talk wasn't to present a list of problems. I wanted to give feedback to the pkgsrc developers from the point of view of an ordinary user who doesn't have commit rights to the pkgsrc CVS tree.

There are a lot of things pkgsrc does right and I did mention a number of them in my talk. There are also a number of things that pkgsrc does not do so well and I pointed some of them out in my talk.

I think pkgsrc developers spend most of their time working with pkgsrc by adding and updating packages, extending the infrastructure to make it more portable and easier to use, and, of course, fixing bugs in the packages and the infrastructure. I believe that that set of activities leads to very different expectations from pkgsrc than what non-developers have.

As a non-developer I expect from pkgsrc that it helps me to install and maintain all the third-party software that doesn't come with the base operating system but which I need for the system to be useful to me. And I want to do that with the minimum amount of time and hassle.

I believe that means using binary packages and good frontends to manage the packages. In my opinion tools like aptitude and the YaST package are a step in the right direction. I think we don't have anything equivalent yet.

We also have issues with keeping the binary package collections for the quarterly branches complete when we have to deal with security fixes. Sometimes pre-requisite packages are removed and no replacement is distributed.

We regularly move packages to a different category directory or change the directory names for packages in pkgsrc. But we do not have tools that deal with these re-namings automatically when one wants to update the installed packages on the system.

These are the sort of issues that an non-developer will trip over.

Why is pkgsrc sometimes hard to use and what needs to be changed? 

I don't think that pkgsrc is hard to use. Quite the contrary. In my opinion it is very easy to use for what it does!

For example I was recently asked to upgrade subversion on a Linux machine for which newer binary packages were not available. With pkgsrc it was a snap:

  1. download and extract pkgsrc

  2. cd pkgsrc/bootstrap; ./bootstrap

  3. cd pkgsrc/devel/subversion-base; make install

I think that is extremely easy to use!

Thanks to the permanent bulk builds that are going on, it is very reliable, too. And it works not only on NetBSD but on other systems too: Linux, Mac OS X, Solaris, IRIX, HP-UX, AIX, MS Windows, you name it. I am not aware of any other system that does the same.

Of course, that assumes you want to and can build from source. When one wants to just install precompiled binary packages pkgsrc isn't as comfortable as other systems that are widely used today.

If pkgsrc wants to do better in the binary packages discipline, I think we need more pkgsrc developers. Maintaining a working, stable and complete set of precompiled packages is a lot of hard and time-consuming work. Much more work than we are currently able to handle. So we need more people who help with the security pullups to the quarterly branches, build binary packages for all the platforms and OS versions afterwards, etc. Every little bit helps.

If someone wants to join and help with the effort, if only by maintaining 2 or 3 packages, I think we should welcome them.

Your presentation confronted the pkgsrc developers with a long list of problems. How was the feedback? 

I think it went rather well. I didn't notice anyone falling asleep during the talk. :-) And we had some good and lively discussions afterwards.

Again, the idea wasn't to confront the pkgsrc developers with a long list of problems. But rather to show them a view of pkgsrc from a different perspective. Hopefully some of my ideas will help them to improve the situation.

But this is a volunteer effort and most developers work on what they care about. And it is difficult to force developers on things they don't care about. Certainly I don't expect them to suddenly start working on my pet peeves.

Did any of the recent changes to pkgsrc improved the situation? 

I don't know. I haven't been paying attention, though. pkgsrc is constantly improving. So some of my issues have very likely been addressed. I find it truly amazing how pkgsrc keeps improving and restructuring even after ten years of existence. I don't see the same vitality in other similar systems.

I believe we do not yet officially support a comfortable binary package manager yet. But there seems to be some progress by non-developers in pkgsrc-wip in that area.

Last question: Do you have any practical tips to share with the pkgsrc users? 

I recommend always building binary packages and keeping them around for a while. Being able to downgrade to the previous version if a an updated package has critical bugs is extremely helpful. Saving the daily insecurity reports that log the changes to the installed packages helps with that, too.

I found pkg_chk by David Brownlee to be very helpful for updating my systems with newer binary packages and for building updated binary packages. make update UPDATE_TARGET=package SPECIFIC_PKGS=1 is very useful for updating the binary packages with newer versions. I keep the settings of the package option variables and the list of SPECIFIC_PKGS in a file in /usr/pkgsrc and include that file from /etc/mk.conf.

And I always save a log file with the output of compilation, installation, and update runs. Adding |& tee logfile has become almost automatic.

Thank you very much for taking the time to answer the questions!


Joerg Sonnenberger about pkgsrc on DragonFly BSD and his pkgsrc development projects

Hi Joerg, please introduce yourself. 

I'm 23 year student of math with very broad interest in Computer Science. I've been programming and working in various areas since high school and joined the Open Source development when starting university. When I don't program or study, I'm likely to either sleep, read, eat or play chess (in that order).

You were the driving force behind the port of pkgsrc to DragonFly BSD. Why did you preferred pkgsrc over alternative systems? 

I think there were only two options. I was fighting with keeping the support for DragonFly in top of FreeBSD ports for quite a while. This was getting harder and harder as DragonFly started to divert from FreeBSD 4. It also got more work because the support for FreeBSD 4 was already declining in FreeBSD ports at that time. At that time Todd Willey had done enough work to get the bootstrap and some major packages working with pkgsrc on DragonFly. When I took a more detailed look I liked it a lot, especially as it helped me avoid a lot of redundant work. I've started to focus on fixing things in pkgsrc, kept Thomas, Jeremy and a few other developers busy with patches and many of the critical items were working after a while. I don't regret that choice and I am faithful that I choose the technically better system. The biggest complaint at that time about pkgsrc (compared to FreeBSD ports) was the lack of portupgrade or equivalent. pkgmanager in wip and pkg_rolling-replace somewhat provide that now, at least enough to stop many of the louder voices.

In 2006 you said that porting pkgsrc to DragonFly was like "Fighting the Windmills". What were the problems? Was pkgsrc not portable enough? 

A mix of both. The biggest problem with pkgsrc itself is (and was) that DragonFly does a number of things different, e.g. pthread linkage. Hunting down all those instances took a while and they still pop up occasionally. There are also a huge number of portability issues in programs, like the infamous *bsd* pattern in configure scripts. If you haven't read the slides from the presentation, there are quite a few number of funny problems I meanted back then.

But it seems that pkgsrc works quite well on DragonFly these days. 

Yeah, I'm trying to keep the breakage down to a minimal level. There was a time when DragonFly was between NetBSD 3.1/i386 and NetBSD 4/i386 in terms of broken packages, which is quite nice. I think the overall state is quite nice, setting the right PKG_PATH and using pkg_add is enough work to get a full scaled system in a short time.

How did the pkgsrc system benefit from the porting efforts? 

DragonFly helped to sort out imake issues, a number of missing libtool overwrites, GCC 3.4 issues, pthread linkage etc. It is also very important to make the portability aspect of pkgsrc a reality, not just a lip service.

You are very active in developing for pkgsrc. Please tell us about your pkgsrc projects so far

The single biggest item so far was the addition of modular Xorg support without breaking the tree for anyone else. The work in that area started early this year and until the official announcement mails no-one would have known without reading the CVS commit mails. That is quite different from how it was handled e.g. in FreeBSD ports with a long ports freeze and a lot of pushing to get everyone to switch at once.

It is hard to say which project I've been working on as so much is going on. At least one other major change I would like to get credit for. We had a system to use different Apache and PHP versions for a long time. Until recently, it was impossible for a binary package user to freely select his or her choice, because only one set was built and even worse, the name was identical. Inspired by the way our Python modules deal with this problem, almost all Apache and PHP modules were changed to be prefixed with the version and have explicit dependencies as well. In combination with pbulk this makes it possible to choose Apache 1.3 and PHP 4, as well as Apache 2.2 and PHP 5 -- without building from source.

In the more recent past, the Summer of Code has kept me busy. Both the new bulk build framework and the cross compiling work are in the tree now and ready for use. They still need polishing, but what doesn't need that? :-)

What are you currently working on? 

I'm actively working on re-factoring the pkg_install code with the goal of making pkg_add and pkg_info use libarchive like pkg_create already does. This is non-trivial work as the code has a huge number of side effects and it is very easy to break things one way or the other. Some of the regressions are just exposed bugs that were never that critical before, others are misunderstandings of how it was thought to work. The benefit for the user is that pkg_add will be much faster. If the prototyping Tim Kientzle did in FreeBSD once can be used as foundation, a factor of 2 in speed is possible and more. This is also very useful for bulk builds as some parts are currently IO bound like building many of the smaller packages.

And the future plans?  http://www.pkgsrccon.org/2007/

More pkg_install work. We had some long discussions in Barcelona and I want to adopt some of the ideas from pkg_jam. A better pkgdb is one of the biggest problems for handling updates more gracefully etc. I try to not plan ahead too much though.

One of your current projects is the "Cross-compiling of modular Xorg". What's the goal, the state and what will this mean for the NetBSD users. Will there be no more xsrc as part of the base system? 

The goal of the cross-compiling support was to create a migration path from xsrc to modern Xorg. It was a hotly discussed topic and only two options were really sensible. The first was to extend the reach-over framework to deal with modular Xorg as well. This was started, but never finished. It would also create some maintenance issues as pkgsrc has to maintain a second set anyway. The second option was to always use pkgsrc and for that it has to allow cross-compilation. That was the goal for this part of the Summer of Code project. The code is integrated into pkgsrc and the tests were quite successful. Remaining issues are documentation and integration with either pbulk or build.sh. For xsrc the server component for all the non-standard hardware on Sparc and older platforms has to be packaged so that we can build them from pkgsrc. Once that is done, xsrc can be retired.

A longtime goal was to make pkgsrc a cross-compilable system like NetBSD. Is this goal now abandoned? 

In general, yes. There are a number of components that simply can't be cross-compiled in a sane manner. Anything with a large complex build system for example. I'm not sure if it is possible to cross-compile Perl or Python for example. Even more important would be the upstream interest in such patches -- this is definitely something we don't want to maintain in pkgsrc. Having said that, we can support cross-compilation for many packages, both smaller and larger ones. Modular Xorg is one specific set where NetBSD has an interest in, others are possible. If there's demand, it can be extended to cover more.

For the ordinary pkgsrc user: What were the most important changes in the last two years? 

The addition of modular Xorg and the on-going efforts to improve binary packages and build consistency based on the check framework. The former is just a requirement for modern hardware and the latter makes the risks of using current quite a bit lower over all.

The DESTDIR support could have quite some impact, but it needs some more work to be supported on a broader scale and be advertised as such.

Thank you very much for taking the time to answer the questions!


James K. Lowden: The pkgjam project

Hi James, please introduce yourself. 

I'm old enough that the computer I use nowadays is better in all dimensions by 4 orders of magnitude, far superior to the clay tablets we had in school. I make my living as a quant on Wall Street. Two beautiful daughters, each smarter and more accomplished than the other. NetBSD user since 1999, maintainer for the FreeTDS project, and insufferable optimist.

Together with pkgsrc developer Johnny C. Lam you started the pkgjam project (project homepage), aimed as the successor of the pkgsrc system. Why is a re-design of pkgsrc necessary? 

The weaknesses of pkgsrc -- and I speak as a fan and user -- are well known. It's hard if not impossible to replace installed packages without disabling your system for a while, sometimes for days, because installed packages are normally deinstalled prior to even building their replacements. Also, it's difficult to interrogate the system about installed (or available) packages beyond what's provided by pkg_info(1).

The basic pkgsrc architecture was designed when 8 MB of RAM was a lot. Nowadays pkgsrc has over 7000 packages and it's certainly not uncommon to have hundreds of packages installed. My main machine -- clearly the prototypical case, obviously -- has 356 installed packages. That's a lot of interdependency. Our systems are more complex than they used to be, and they're more powerful, but the basic tools of pkgsrc -- make and shell, and the Berkeley database -- are unchanged.

No one pretends that make(1) is the be-all and end-all of complexity management tools, but it lives at the center of pkgsrc. (At the pkgsrcCon in Barcelona last April, it was estimated that 50% of the time building packages was spent not in compiling and linking, but in processing Makefiles.) To take make(1) out of its starring role in pkgsrc is to redesign it, so that's where we headed.

pkgsrc is no developer's picnic, either. Anyone who works on the project will tell you it requires a lot of knowledge about the tools and about the project's conventions. Johnny and I both believe that a lot of pkgsrc's internal complexity results from using make(1) for things it was never meant for. We hope pkgjam will be easier for developers to grok and join.

What are the main concepts and technologies that pkgjam is based on? 

We can reduce package complexity and manage the remaining complexity better. Let's apply some of the lessons learned from 10 years of pkgsrc and some of the tools now available to make the system faster, more transparent, and more robust.

To that end, pkgjam has three main features:

  1. Dependency Independence: If packages B and C both rely on package A, and you want to upgrade package B, which requires upgrading package A, need you upgrade package C, too? Logically, no. Technically, no, too, as long as you're prepared to have two versions of package A installed (one for B and one for C). We reduce one kind of complexity, interdependence, by introducing another, package proliferation. But the system becomes more modular, more robust. Especially if you can install B 2.2 while B 2.1 remains installed and working.

  2. Relational Database for package metadata: This is how we attack the remaining complexity. All package metadata -- build-time options and dependencies, licences, everything you need to know to build a package -- is stored in the database. So is information about packages you installed and the options you built them with (and their dependencies, of course). The database also has all your preferred options, settings that influence how packages are built. pkgjam uses the database to track dependencies and to generate build plans. Users and developers can query the database with SQL to answer questions not supported by pkg_info(1).

  3. New Build Tool: The new build tool is what's invoked by the user to build and install packages. It relies on the database to decide what to do, and relies on make(1) to do what it does best: build and installpackages.

One feature of pkgjam is the "Dependency Independence": every package is installed in its own directory. Is this concept inspired by the "Application Directories" from RISC OS or the "Bundle" concept from NEXTSTEP? 

I have no experience with NEXTSTEP or RISC OS and didn't even pretend to try to research prior art. It was clear to both of us from the start that what I call "the one true tree" made using pkgsrc harder than it had to be. Johnny has been a pkgsrc developer for years; I approached the package metadata problem as a user. Johnny also had experience from working on pkgviews. You could say, informally, that he's the problem domain expert and I'm the data analyst.

When we started to sketch out our ideas to each other, we looked a little into how OS X does things. OS X has a slightly different notion of how an application finds its shared libraries: approximately, it permits the RPATH to have a relative path. That lets an OS X user pick up his application and drop it anywhere. The whole subtree moves, and the application looks only down its tree for its shared objects.

NetBSD and, as far as I know, all the free operating systems we see pkgjam supporting one day, use ELF binaries and expect absolute RPATHs. On such systems, you can move the executable, but not its shared objects. I'm told that this is a kernel property and not particularly hard to change.

And what about the handling of shared libraries, is every package installed with its own libs? 

Basically, yes. A shared library is first installed in its canonical directory. Dependent applications do not refer to it there, however. Instead, a link to the shared object is created in the application's tree. The application's RPATH points there.

We use a hard link to the shared object because we want to be able to delete the original and replace it with an updated (or just differently configured) version. We want to be able to do anything with a shared library package without disturbing its dependents.

In practice, there are some constraints. Some libraries, for example, refer to a database whose location is relative to the install directory, or otherwise insist on using a relative path to find some resource. Such packages can't be freely deleted and replaced. But that's a feature of the package, not of the package management system.

How will the use of pkgjam be different from pkgsrc? 

Johnny and I both like the simplicity of pkgsrc's cd <pkg> && make && make install. There'll be more separation between building and installing. The method of expressing system-wide preferences and package-specific options will be very different, because they'll be kept in the database instead of make variables.

Which new and better solutions will pkgjam provide for the ordinary users and system administrators? 

Today, to install package X version N, you have to first deinstall N-1. pkgjam will let you build version N while using N-1. It will let you install N without erasing N-1. Reverting is just a matter of removing the links to N and recreating the links to N-1.

Dependencies are all in the database, making it possible to view what would be affected by an action before it's taken. I don't know of any package system that does anything like that in a way that's comprehensible to a human being.

You presented pkgjam on this years pkgsrcCon in Barcelona. How was the feedback from the developers community? 

Some mild skepticism, but generally great enthusiasm. So much so that I changed direction, focusing more on making pkgjam ready for others to join and use, and less on automatically generating "easy" packages.

About the current state of pkgjam: is it ready to use? 

It's alpha software. If you're interested in developing a packaging system and you like the pkgjam approach, then we have a skeleton you can use that we're building on. I said in Barcelona we'd have something ready by July, and we do have something. We took a break over the summer. Stay tuned.

Please tell us about your roadmap and about future perspectives. 

We're using an iterative approach: build something, use it, tear it down, do it again. We want the interface to the database to be well defined for the build tool, such that there could be other build tools (say, a GUI or a web-based one). We want the packager's job to be as declarative as possible. The packager shouldn't have to become a database expert.

As our package list grows, the database and its interface will stabilize, and people will, unless we build the next Edsel, start using pkgjam. That will create a need for canned reports, something like pkg_info(1). With luck, early adopters will have SQL knowledge and will be able to contribute and collaborate on new ways to use the database.

pkgjam will be "done" when every useful pkgsrc package is also a pkgjam package. To get there, we'll need packagers and users. To attract them, we need to make packaging easy and package management better. Which is fine, because that was our goal in the first place. Because, you know, I'm not just a pkgjam developer. I'm also a client.

Thank you very much for taking the time to answer the questions!

The concepts of Application Directories

Self-containing application directories first appeared on the NEXTSTEP and RISC OS operating systems: All files needed by an application are grouped together into one directory. Application directories are known as Application Bundles under NEXTSTEP, GNUstep, Mac OS 9 and Mac OS X.

Various opensource projects use similar concepts: The ROX Desktop adopted the Application Directories from RISC OS. The klik software install system for Linux and the PBI system for PC-BSD are independent implementations. Nix is an advanced, "purely functional package manager" for Unix systems, each version of a package is stored in its own directory. On GoboLinux, a Linux distribution with an alternative directory structure, each program gets its own directory tree.


Feedback, corrections and additions are always welcome, please contact me via email.

Many thanks to all participants!

Please note that the expressed opinions and the selection of topics do not represent the official views and directions of the pkgsrc project.

- Mark Weinem