Identifier Quality and Code Quality

Our PhD student Simon Butler has been looking at whether there is some relation between poor identifiers (e.g. that are too short or don't use dictionary words) and code quality. The hypothesis is that poor identifiers may reflect less than optimal understanding of the problem or of the domain and in turn may impact on the program's understanding during maintenance. All this may thus lead to code quality problems: potential bugs, convoluted code, etc.

Simon had a short paper at the Working Conf. on Reverse Engineering (WCRE) last October, presenting his first study on open source Java applications. The paper was quite well received, from what he told us. Simon's talk was the only non-invited talk the WCRE steering committee chair twittered about, and it was mentioned the next day by Nicolas Anquetil during his most influential paper talk.

Simon has meanwhile enlarged the depth and breadth of the study: he did the analysis at method level (before it was at class level), he compared naming quality with more code quality metrics, and he also compared it with a readability metric. He also did a statistical analysis to see if some naming problems (e.g. not using dictionary words) could be used as a light-weight test to point to methods with readability and quality problems. The paper we submitted to the European Conf. on Software Maintenance and Reengineering (CSMR) was accepted and today we uploaded to our institutional open access repository the camera-ready version that, from past experience, will take some months to show up in IEEE Xplore.