Conclusions and related work

It is important to notice that these counts may represent roughly the whole collection of stable free software packages available on GNU/Linux at the time of the Debian 2.2 release (August 2000). Of course, there is free software not included in Debian, but when we come to popular, stable and usable packages, most of them have been packaged by a Debian developer and included in the Debian distribution. Therefore, with some care, it could be said that this kind of software amounted for about 60,000,000 SLOC around the summer of 2001. Using the COCOMO model, this implies a cost (using traditional, proprietary software development models) close to 2,000 million USD and and effort of more than 170,000 person-months.

We can also compare this count to that of other Linux-based distributions, notably Red Hat. Roughly speaking, Debian 2.2 is about twice the size of Red Hat 7.1, which was released about eight months later. It is also larger than the latest Microsoft operating systems (although, as is discussed in the corresponding section, this comparisons could be misleading).

When coming to the details, some interesting data can be shown. For instance, the most popular language in the distribution is ANSI C (more than 70%), followed by C++ (close to 10%), LISP and Shell (about 5%), and Perl and FORTRAN (about 2%). The largest packages in Debian 2.2 are Mozilla (about 2,000,000 SLOC), the Linux kernel (about 1,800,000 SLOC), XFree86 (1,250,000), and PM3 (more than 1,100,000).

There are not many detailed studies of the size of modern, complete operating systems. Of them, the work by David Wheeler, counting the size of Red Hat 6.2 and Red Hat 7.1 is the most comparable. Other interesting paper, with some intersection with this paper is "Evolution in Open Source Software: A Case Study", an study on the evolution over time of the Linux kernel. Some other papers, already referenced, provide total counts of some Sun and Microsoft operating systems, but they are not detailed enough, except for providing estimations for the whole of the system.

To finish, it is important to repeat once more that we are offering only estimations, not actual numbers. Those depend too much on the selection of the software to measure, and on some other factors which were already discussed. But we belive they are accurate enough to draw some conclussions, and to compare with other systems.