Visualizing Software Evolution with Code Clones
Revision control systems offer a vast amount of information that can help to understand (the evolution of) a source code base. The amount, location and span of code clones are a reliable measure when assessing quality of (the source code of) a software project. More interestingly, changes in code clones reveal the dynamics of a code repository, allowing us to obtain insight into the evolution of a project.
Mining changes in large projects
We exploit these ideas in ClonEvol, a tool for analyzing code changes large software repositories using clone change data. The focus of our tool lies on
- scalability (in time and space) for data acquisition, processing, and visualization;
- genericity in terms of supported programming languages;
- simplicity of use: point the tool to a repository URL and press 'Go';
- utilization of third-party open source components.
Scalability is achieved by limiting data acquisition and fact extraction to only differences between code base versions. Where other tools take days or even weeks to mine a real-life project, ClonEvol can process hundreds of revisions per hour. ClonEvol supports all languages that are supported by its components (Doxygen and Simian), including C, C++ and Java.
Tool-chain approach
Subversion change-logs, static analysis and code clone detection are combined to obtain information about clone evolution. The following clone-related events are detected:
- addition: software is modified or added, thereby yielding a new clone;
- deletion: a software fragment is removed, thereby removing a clone;
- drift: software code is moved around, thereby causing a clone to drift;
- split: a software fragment is cut into two parts which are next moved in different places;
- merge: two software fragments are moved to the same place;
To detect such events, we proceed as follows:
- revisions are mined from a Subversion repository;
- clones are searched between modified files in one revision and all files in the previous revision, using Simian;
- code structure is extracted using lightweight static analysis provided by Doxygen;
- clones are lifted to structural level;
- clone-related events are detected.
Visualization with HEB
The visualization is achieved with a (mirrored) radial tree to show the file and scope structures, complemented with hierarchically bundled edges that indicate the clone relations. The user can scroll through time to search for particular events and apply three color maps to highlight structure (files, classes, functions), change-log differences and code activity. This technique is similar to the one used in SolidSDD.
ClonEvol can be used to track changes performed on file and syntactic scope level (e.g. methods, classes, and namespaces). The image below shows the evolution of TortoiseSVN trunk/src/TortoiseProc/LogDlg.cpp
in revisions 10000 to 10500. Each block represents a scope.
The video below shows the evolution of the FileZilla repository between revisions 1 and 5165. The left radial tree shows the structure; Code changes, movement and clone addition/removal are shown in the center ring; On the right side, code & clone activity are depicted.
(:html:) <iframe width="540" height="304" src="https://www.youtube-nocookie.com/embed/P2Uyy7BYfIU?rel=0" frameborder="0"></iframe> (:htmlend:)
ClonEvol was used to generate 3 sequences of 4089 PNG files, 20.1GB in total. These sequences represent:
- 4,119 SVN revisions of FileZilla containing 1.1GB of relevant changes;
- 28,207 modified source-code files;
- 591,663 (changed) scopes, i.e. functions, attributes, etc.
- 275,478 scope clones
- This resulted in an SQLite database of approximately 226MB.
Time-wise, approximately 11:45 hours were needed to mine the FileZilla repository. Hereof respectively 4:00, 5:30 and 2:15 hours elapsed during file acquisition, scope extraction and clone extraction.
Implementation
ClonEvol is implemented in C++, using Qt for graphics, SQLite for storing the clone information, Doxygen for lightweight parsing, and Simian for clone detection.
Availability
The binaries for ClonEvol for the Windows platform are available below:
- Minimal distribution. You will need to install Doxygen, Simian, and svn along with it. See the README file in the distribution.
- Full distribution. This should run without any third-party package installations. Please note that this distribution is for research use only. Also, if you use this code, you should acknowledge its developer (see publications below).
A list of interesting repositories you can use ClonEvol on is given below:
- FileZilla (https://svn.filezilla-project.org/svn/FileZilla3/trunk/src/)
- TortoiseSVN(http://tortoisesvn.googlecode.com/svn/trunk/src/)
- Apache Any C/C++/Java project from http://projects.apache.org/indexes/language.html
Publications
A. Hanjalic Visualizing Software Evolution with Code Clones MSc Thesis, JBI Institute, Univ. of Groningen (2014)
ClonEvol: Visualizing Software Evolution with Code Clones (A. Hanjalic, Proc. IEEE VISSOFT 2013).
ClonEvol: Visualizing Software Evolution with Code Clones (A. Hanjalic, tool demo poster, IEEE VISSOFT 2013).