Mining Large Software Compilations over Time: Another Perspective of Software Evolution

  • Gregorio Robles, Universidad Rey Juan Carlos
  • Jesus M. Gonzalez-Barahona, Universidad Rey Juan Carlos
  • Martin Michlmayr, University of Cambridge
  • Juan Jose Amor, Universidad Rey Juan Carlos

Abstract

With the success of libre (free, open source) software, a new type of software compilation has become increasingly common. Such compilations, often referred to as ‘distributions’, group hundreds, if not thousands, of software applications and libraries written by independent parties into an integrated system. Software compilations raise a number of questions that have not been targeted so far by software evolution, which usually focuses on the evolution of single applications. Undoubtedly, the challenges that software compilations face differ from those found in single software applications. Nevertheless, it can be assumed that both, the evolution of applications and that of software compilations, have similarities and dependencies.

In this sense, we identify a dichotomy, common to that in economics, of software evolution in the small (micro-evolution) and in the large (macro-evolution). The goal of this paper is to study the evolution of a large software compilation, mining the publicly available repository of a well-known Linux distribution, Debian. We will therefore investigate changes related to hundreds of millions of lines of code over seven years. The aspects that will be covered in this paper are size (in terms of number of packages and of number of lines of code), use of programming languages, maintenance of packages and file sizes.

Availability

Reference

Robles, G., Gonzalez-Barahona, J. M., Michlmayr, M., Amor, J. J. (2006). Mining Large Software Compilations over Time: Another Perspective of Software Evolution.  In: Proceedings of the International Workshop on Mining Software Repositories (MSR 2006). 3–9.