A Study of Static Code Analysis Tools

As software engineering students, we are trying to study how the tools Understand from Scitools and Clang help to analyze large code bases. In this blog we have put forward some analysis provided by these tools on four different code bases covering variety of languages like C,C++,Python,Javascript, Objective C, CSS files, HTML code.By application of reverse Engineering tools, we intend to extract different types of design views such as formal specifications structure charts, data flows diagrams, inter-modular data flows, control flow graphs, Entity-Relationship and slices of those views. We looked up a number of tools and zeroed in on

About Understand:

Understand is a static code analyzer for a code base which gleans metrics like:

Understand has an inbuilt reverse engineering feature which helps the user to get a good understanding of how the code was developed by providing graphical representations

There are two methods to mine objects from the procedural code using Understand:

1. Identify global structures which apparently represent the state of some object

2. Identifying dependencies between data-types in a program

Clang:Embedded Static Analyzer in Understand

Clang Static Analyzer is a source code analysis tool that finds bugs in C, C++, and Objective-C program. Clang does a number of checks like:

Detection of memory leaks

Checking of virtual function calls during construction and destruction

Checks for dead code by looking for idempotent operations and unreachable code blocks

Check for dereferences of null pointers

Check for division by zero and logical errors by function calls.

Time for some real analysis

We picked up the following open source code from github to statically analyze this code once we downloaded Understand and Clang.

1. Audacity:

Possible Bugs/Code Checks

Code Standards Check using Clang

We tested audacity for code standards and probability of bug occurences.
Our results are as follows:
1. 281 files had control flow violations(Dangling Else, Single Exit Point at End, Unreachable code)

2. 344 files contained violations related to Memory Allocation in the form of dynamic heap allocation.
Example code:(src/commands/commandType.cpp)
if(mName!=null){
delete mName;
}

Understand calculated the cyclomatic complexity of functions and listed a number of functions which qualified as the most complex ones in the code.

Most Complex Functions

The metrics summary generated was as follows:

Inferences:

The code base is pretty large which can be seen from the lines of code. Also code appears to be well documented as the Comment to lines Ratio is 33%, however we cannot guarantee that the comments are appropriate as there is no golden rule and the number of comments depends greatly on the inherent complexity of the code.
Possibility of refactoring: There a some files like which need to be refactored. This will help to debug and maintain files in an improved manner.
Also, we can take care to add more test cases to the files containing the complex functions as well as ensure that these functions are well explained using meaningful comments so that it becomes easier to understand and maintain the code base.

2:Re-think DB:

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

An important component of process improvement is the ability to measure the process.

Analysis The OO-Design approach is concerned about modeling the world about objects rather than adopting a function oriented view.

The four steps involved in the Object Oriented Design process are:

a. Identification of classes and objects

b. Semantics of classes and objects

c. Relationships between classes and objects

d. Implementation of classes and objects

The metrics developed can be listed as follows:

1. Weighted Methods Per Class:

The following viewpoints have been developed in relation to gleaning OOD metrics from the classes and objects:

a. The number of methods and the complexity of methods involved indicate the time and effort required to maintain the code.

b. Larger the number of methods in a class, the greater will be the impact since the children will inherit all the methods in the class.

c. Classes with larger number of methods tend to be application specific limiting the possibility of reuse.

Largest Function

The above figure shows the graphical representation of functions in the order of their cyclomatic complexity. By enlisting the complex functions, we can refactor the code and develop test cases to ensure all code flows work properly.

2. Depth of Inheritance Tree:

Viewpoints:

a. The deeper a class is in the hierarchy, the greater the number of methods it will inherit.

b. Deeper trees entail greater design complexity due to the number of classes and functions involved.

c. The deeper a class is in the hierarchy, the greater is the potential reuse of inherited methods

Architectural Dependencies Using Understand

3. Number of children:

Viewpoints:

a. Since inheritance is a form of reuse, greater the number of children, greater is the reuse.

b. Misuse of subclassing may result from the large number of children present in a hierarchy as it could lead to improper abstraction of the parent functions.

c. More testing of methods will be needed if a class has a lot of child classes.

Architectural Browser

Inferences about the code:

Certain coding standards were violated like presence of commented out code, goto statements which should not be present ideally. Also certain functions were too long. There were also instances of unreachable code and usused functions.
In the metrics summary, we get an idea about the lines of code, classes and files involved. The architectural browser gives us an idea about the languages and directory structure. It becomes easier for maintainers to seek expertise in the field they need to maintain and support the code
The lines of code are few in comparison to the previous two code bases explored, this is a relatively smaller application, also the comment to line ratio is 0.57 which may indicate the possibility of a well documented code.

3.JSON Parser:

A fresh approach to JSON loading that speeds up web applications by providing the parsed objects before the response completes.

Dependency Graph

The source code of the json parser is entirely javascript. We started analysis by applying reverse engineering using dependency graphs to understand the code flows and learn how the different classes were associated with one an other. We also attempted to generate UML diagrams for single files to analyze structural properties of the class.

Internal Architectural Dependencies

Understand generates graphs to visualize architectural dependencies between systems constructed in heterogeneous environments using different programming languages. The UML diagram below was generated which helped in representing the interactions between collaborating objects.

UML diagram using Understand

We implemented a few code checks to check for violations of standards using Clang and got the following result log:

Result Log using Clang

Metrics Summary using Understand

Inferences about the code:

No prescribed coding standards present in Clang were not violated, hence we can say that the code is well written.
In the metrics summary, we get an idea about the lines of code, classes and files involved. This is a js file, hence we can expect a number of files which can be either inline or accessed through a call to their location. The architectural dependency graph clearly helps the developer or maintainer to locate which js files are also accessed and helps to easily debug the code by getting a proper understanding about the control flow.
The lines of code are few in comparison to the previous two code bases explored, this is a relatively smaller application, also the comment to line ratio is 0.57 which may indicate the possibility of a well documented code.

4. Fast Image Cache:

Fast Image Cache is an efficient, persistent, and fast way to store and retrieve images in any iOS application.

Metrics Treemap using Understand

Architecture Browser using Understand

We began analysis of Fast Image Cache code by generating Metrics Treemap. A Metrics Treemap visualizes the code by generating a hierarchical package structure as nested rectangles with parent packages encompassing child packages. It helps us understand how the code is structured, whether there are any major issues and if they are localized or spread throughout the database. This helps us visualize complex classes with the color showing the sum of the cyclomatic complexity of all the methods in the class. Usually file size and complexity are directly proportional so larger rectangles indicate larger file sizes.

The feature of Architecture Browser helps us to visualize the different classes in Fast image Cache and the different programming languages involved to build them.

Summary

We have evaluated the two static analysis tools Understand and Clang by running them over the selected code bases as mentioned above. Understand helped us to glean metrics and graphical representations to help understand the code structure better.Clang on the other hand checked the code base for violations like coding standards and helped detect memory leaks, dangling pointers and null pointer dereferences. We have learned that though the tools are really helpful in finding bugs, they can be resource intensive and the time required to run these tools is more than that required for compilation. Also there are instances of false positives, i.e, it falsely claims that there are bugs in the program where the code otherwise behaves perfectly.

1 comment:

ICS Cyber SecurityJanuary 28, 2017 at 11:40 PM
glean metrics and graphical representations to help understand the code structure better.Clang on the other hand checked the code base for violations like coding standards and helped detect memory leaks, dangling pointers and null pointer dereferences.

Best static code analysis tools

Monday, November 4, 2013

About Understand:

Clang:Embedded Static Analyzer in Understand

Time for some real analysis

1. Audacity:

Inferences:

2:Re-think DB:

1. Weighted Methods Per Class:

2. Depth of Inheritance Tree:

3. Number of children:

Inferences about the code:

3.JSON Parser:

Inferences about the code:

4. Fast Image Cache:

Summary

1 comment: