It has been repeatedly reported that most of the costs associated to a software project are spent with the maintenance of the software, not with its initial development. With the widespread availability of open source software repositories (like SourceForge) that store the various releases of the source code and associated information (e.g. about bugs), academic researchers have increasingly looked at how software evolves to try to better understand the maintenance process and to extract lessons from it. For example, some researchers have looked at how the complexity of functions has evolved over time and whether developers have invested time into restructuring activities to decrease such complexity. Other researchers have looked whether the amount of cloned code has increased over time. They aim to find any general rules or tendencies that can help managers and developers understand better the project (e.g. to see which modules are problematic because of the many changes they are undergoing). Some researchers take this further and aim at predicting the future from the past history of a project, e.g. to predict which modules are likely to have more bugs in the future based on the past bug reports.
Many tools have been built to extract the necessary information from historic software repositories, process and analyze it, and then visualize the results (e.g. Moose). Unfortunately, many of those tools are not publicly available or they use their own internal representations and input/output formats, which makes it difficult to reuse such tools for different purposes. Some researchers have advocated for a common tool platform and a common repository representation format (see the TA-RE paper), but it largely remains an unaccomplished goal. A possible alternative is to ‘glue’ together existing tools via some scripts that translate one tool’s output into the next tool’s input (see FETCH in the References section).
Many projects are possible within this topic; some generic suggestions are:
- Use an existing tool to answer a specific research question (like the evolution of cloned code), analyzing various repositories
- Repeat one of the experiments reported in the literature, but for proprietary code from your organization, and compare the results with open source software
- Repeat one of the experiments reported in the literature for an agile development project and compare results
- Develop a new visualization for a certain class of evolution analyses and add it to an existing tool (or tool chain)
- Improve some prediction algorithm based on historical data
Skills and Background required
The candidate should have good programming skills. Some knowledge of statistics or the willingness to learn about statistics may be required, depending on the project. Off-campus students should have a fast PC with a large disk to store and analyze several large repositories. On-campus students will be provided with such a machine.
References
For further software evolution papers, see e.g. the proceedings of the Intl Workshop on Software Evolution (IWPSE) or the ERCIM Workshop on Software Evolution.
- Capiluppi et al, Exploring the Relationship between Cumulative Change and Complexity in an Open Source System, 9th European Conf on Software Maintenance and Reengineering (CSMR), 2005
- Greevy et al, Analyzing software evolution through feature views, J. of Software Maintenance and Evolution: Research and Practice 18(6):425- 456, Nov./Dec. 2006
- Kagdi et al. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J. Software Maintenance and Evolution, 19(2):77-131, 2007
- Kim et al, TA-RE: An exchange language for mining software repositories, Proc. MSR 2006
- Lozano and Wermelinger. Assessing the effect of clones on changeability. Proc. ICSM 2008
- Madhavji, Fernandez-Ramil and Perry (eds.), Software Evolution and Feedback: Theory and Practice, Wiley 2006
- Mens and Demeyer (eds.), Software Evolution, Springer, 2008
- Zimmermann et al, Mining Version Histories to Guide Software Changes. Proc. 26th Intl Conf on Software Engineering, 2004
- Tools: Fact Extractor Tool Chain (FETCH) and the integrated reverse engineering environment Moose