Visual Basic static analysis, clone detection, and automatic refactoring
Visual Basic (VB) is one of the fastest growing programming languages in the industry. Although less powerful than C++, it already provides powerful constructs that allow the development of large applications of similar complexity to Java and C#.
Hence, large industrial VB code bases have the same challenges as exhibited in other programming languages. One such challenge is the presence of code clones, i.e. code fragments of identical semantics and (nearly) similar syntax. Finding and removing such fragments automatically in large code bases is a difficult process.
In this project, we developed a framework for syntactic detection and automatic removal of code clones in VB. The tool consists of the following components.
VB syntactic and semantic analyzer
The first step required for automatic refactoring is syntax and semantic analysis. We have built a full syntactic and relatively complete semantic analyzer for VB as a separate component in our framework. The analyzer covers the full grammar of VB and builds a complete abstract syntax tree (AST) directly from VB project files, which handles both source code and libraries (assemblies). It also saves location information and allows AST editing and pretty printing of the code for refactoring purposes.
The semantic analyzer implements a large part of the VB language specification. For all nodes in the AST, type information is saved, such as the
- AST node where symbols are declared
- full scope information
- declaration details (e.g. members of compound types)
- usage of variables
- pass by value or by reference of parameters
- type matching
- resolution of types of expressions
Semantic information is saved in a separate type system. Both syntax (AST) information and semantic information are queryable by means of APIs after a project is analyzed. For full details, see the documentation below.
The syntax and semantic analyzer efficiently handles large code projects. On a typical PC, analyzing a project of 100000 LOC takes under 1 minute.
Code clone detector
Clones are detected using syntax and semantic information as contiguous code fragments which have the same semantics and (nearly) the same syntax. Variable name changes and expressions which have different syntax and possibly type but the same semantics are also found as clones.
Program transformation for clone removal
After clones are detected, their instances are refactored by method extraction: the code of the clones is replaced by calls to a newly generated method which contains the common code. The refactoring automatically inserts the correct code to call the new method, including parameter passing and return values.
DejaVu tool
DejaVu is a tool that integrates the VB static and semantic analyzer, clone detector, and refactoring described above. The tool allows the user to
- select a VB project
- perform the static and semantic analysis and clone detection
- visually see the clones as highlighted text atop of the source code
- select clones of interest and perform the refactoring
All above operations are doable in a GUI front-end with just a few clicks. A snapshot of the tool is shown below.
The tool is available for download here.
Documentation
The documentation on building, installation, and use of DejaVu will come soon here.
Papers
The static analyzer, code clone detector, and program transformation framework is described in detail in the MSc thesis of Liewe Kwakman available here.