You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

29 KiB

title description
AST Language (AL) Declarative linting framework over the Clang AST.

Declarative linting framework over the Clang AST.

***DEPRECATED*** On end-of-life support, may be removed in the future.

Activate with --linters.

Supported languages:

  • C/C++/ObjC: Yes
  • Java: No

For C/C++ and Objective-C languages, we provide a linters framework. These are checks about the syntax of the program; it could be about a property, or about code inside one method, or that a class or method have certain properties. We provide a few checks by default and we have developed a domain specific language (DSL) to make it easier to write checks.

AL: A declarative language for writing linters in Infer

One of the major advantage of Infer when compared with other static analyzers is the fact it performs sophisticated inter-procedural/inter-file analysis. That is, Infer can detect bugs which involve tracking values through many procedure calls and the procedures may live in different files. These may be very subtle bugs and designing static analyses to do that is quite involved and normally requires deep static analysis expertise.

However, there are many important software bugs that are confined in the code of a single procedure (called intra-procedural). To detect these bugs simpler analyses may suffice which do not require deep technical expertise in static analysis. Often these bugs can be expressed by referring to the syntax of the program, or the types of certain expressions. We have defined a new language to easily design checkers which identify these kind of bugs. The language is called AL (AST Language) and its main feature is the ability to reason about the Abstract Syntax Tree of a program in a concise declarative way. AL's checkers are interpreted by Infer to analyze programs. Thus, to detect new kind of bugs in Infer one can just write a check in AL. We will see in more detail later, that for writing AL formulas we also need predicates: simple functions that check a property of the AST. Predicates are written in OCaml inside Infer, thus it requires a bit of OCaml knowledge and getting familiar with the OCaml data structure for the clang AST.

Getting the clang AST

When you write a linter that traverses the AST of some programs to check some property, you probably need to understand what the AST looks like. You can get the AST of programs using clang directly, or using Infer.

If you have a clang command clang <clang arguments> File.m then you can get the AST with

clang <clang arguments> -Xclang -ast-dump -fsyntax-only File.m

You can also get the AST using Infer. One advantage of this is that you don't need to know the speicifc clang command, just the general build command. Moreover, what you get here is exactly the form of the AST that Infer has as input.

For this you need to install an OCaml package biniou with opam install biniou. See the opam website for instructions on how to install opam.

Then, the AST can be created by Infer in debug mode. Call Infer with

infer --debug -- <build command>

This will, among other things, generate a file /path/to/File.m.ast.sh for every file /path/to/File.m that is being analyzed. Run this script with bash File.m.ast.sh and a file /path/to/File.m.ast.bdump will be generated, that contains the AST of the program in bdump format (similar to json). If you get an error about bdump not being found you may need to run eval $(opam env) to get the bdump executable (provided by the biniou opam package) into your PATH.

For general info on the clang AST, you can check out clang's website.

Using AL to write linters

Let's start with an example. Suppose we want to write the following Objective-C's linter:

"a property containing the word 'delegate', but not containing the word 'queue' should not be declared strong".

We can write this property in the following way:

DEFINE-CHECKER STRONG_DELEGATE_WARNING = {

    LET name_contains_delegate =
        declaration_has_name(REGEXP("[dD]elegate"));

    LET name_does_not_contain_queue =
        NOT declaration_has_name(REGEXP("[qQ]ueue"));

    SET report_when =
        WHEN
           name_contains_delegate
           AND name_does_not_contain_queue
           AND is_strong_property()
        HOLDS-IN-NODE ObjCPropertyDecl;

    SET message = "Property or ivar %decl_name% declared strong";
    SET suggestion = "In general delegates should be declared weak or assign";
    SET severity = "WARNING"
  };

The linter definition starts with the keyword DEFINE-CHECKER followed by the checker's name. The first LET clause defines the formula variable name_contains_delegate using the predicate declaration_has_name which return true/false depending whether the property's name contains a word in the language of the regular expression [dD]elegate. In general a predicate is a simple atomic formula evaluated on an AST node. The list of available predicates is in the module cPredicates.mli (this list is continuously growing and if you need a new predicate you can add it in ocaml). Formula variables can be used to simplify other definitions. The SET report_when is mandatory and defines a formula that, when evaluates to true, will tell Infer to report an error. In the case above, the formula is saying that we should report when visiting an ObjCPropertyDecl (that is the AST node declaring a property in Objective-C) where it holds that: the name contains "delegate/Delegate" (name_contains_delegate) and the name doesn't contain "queue/Queue" (name_does_not_contain_queue) and the node is defining a "strong" property (is_strong_property()).

The SET message clause defines the error message that will be displayed to the user. Notice that the message can include placeholders like %decl_name%. Placeholders are evaluated by Infer and substituted by their current value when the error message is reported. In this case the name of the declaration. The SET suggestion clause define an optional hint to give to programmer on how to fix the problem.

The general structure of a checker is the following:

DEFINE-CHECKER id_of_the_checker = {

     LET formula = <formula definition>;
     LET ….

     SET report_when = <formula definition>;

     SET name = <optional name>;
     SET message = <error message to show the user>;
     SET suggestion = <optional suggestion to the user>;
     SET doc_url = <optional URL to documentation of the issue>;
     SET severity = INFO | LIKE | ADVICE | WARNING | ERROR;
     SET mode = ON | OFF
     SET whitelist_path = {path1, path2, ..., pathn };
     SET blacklist_path = {path1, path2, ..., pathn };
  };

The default severity is WARNING and the default mode is ON, so these are optional. If the check is OFF it will only be available in debug mode (flags --debug or --linters-developer-mode). INFOs are generally also not reported, except with some specialzed flags. name and doc_url are used only for CI comments at the moment (in Phabricator).

Defining Paths

whitelist_path and blacklist_path are optional, by default the rule is enabled everywhere. For specifying paths, one can use either string constants ("File.m") or regexes (REGEXP("path/to/.*")) or variables. The variables stand for a list of paths, and are defined in a separate block:

 GLOBAL-PATHS {
     path1 = {"A.m", REGEXP("path/to/.*")};
  };

Defining Macros

It is possible to define macros that can be used in several checkers. This is done in the following way:

GLOBAL-MACROS {
  LET is_subclass_of(x) =
        is_class(x) HOLDS-IN-SOME-SUPERCLASS-OF ObjCInterfaceDecl;
 };

GLOBAL-MACROS is the section of an AL specification where one can define a list of global macros. In the example we are defining the macro is_subclass(x) which can now be used in checkers instead of its complex definition.

It is possible to import a library of macros and paths with the following command:

#IMPORT <library.al>

In an AL file, the command above import and make available all the macros and paths defined in the library.al file.

AL Predicates

The simplest formulas we can write are predicates. They are defined inside Infer. We provide a library, but if the predicate that you require is not available, you will need to extend the library. Here are the some of the currently defined predicates:

call_class_method ("class_name", "method_name")
call_function ("method_name")
call_instance_method ("class_name", "method_name")
call_method ("method_name")
captures_cxx_references ()
context_in_synchronized_block ()
declaration_has_name ("decl_name")
declaration_ref_name ("decl_ref_name")
decl_unavailable_in_supported_ios_sdk ()
has_cast_kind("cast_kind") // useful in a cast node
has_type ("type") // only builtin types, pointers and Objective-C classes available at the moment
isa ("class_name")
is_assign_property ()
is_binop_with_kind ("kind")
is_class ("class_name")
is_const_var ()
is_global_var ()
is_ivar_atomic ()
is_method_property_accessor_of_ivar ()
is_node ("node_name")
is_objc_constructor ()
is_objc_dealloc ()
is_objc_extension ()
is_objc_interface_named ("name")
is_property_pointer_type ()
is_strong_property ()
is_weak_property ()
is_unop_with_kind ("kind")
method_return_type ("type") // only builtin type, pointers, and Objective-C classes available at the moment
objc_method_has_nth_parameter_of_type("type")
using_namespace("namespace")
within_responds_to_selector_block ()

In general, the parameters of predicates can be constants, or variables, or regular expressions. Variables are used in macros, see below. The syntax for using regexes is REGEX("your_reg_exp_here").

NOTE: The predicates that expect types, such as has_type or method_return_type or objc_method_has_nth_parameter_of_type also accept regexes, but the syntax is a bit different: REGEX('your_reg_exp_here'), and this regex can be embedded inside another string, for example: has_type("REGEXP('NS.+')*" ) which stands for pointer to a class of name starting with NS.

If you need to add a new predicate, write the predicate in cPredicates.ml and then register it in CTL.ml.

AL Formulas

Formulas are defined using a variation of the CTL temporal logic. CTL is a logic expressing properties of a tree model. In the case of AL, the tree is the AST of the program. Formulas are defined according to the following grammar:

formula ::= predicate
          | NOT formula
          | formula1 OR formula2
          | formula1 AND formula2
          | formula1 IMPLIES formula2
          | formula1 HOLDS-UNTIL formula2
          | formula1 HOLDS-EVERYWHERE-UNTIL formula2
          | formula HOLDS-EVENTUALLY
          | formula HOLDS-EVERYWHERE-EVENTUALLY
          | formula HOLDS-NEXT
          | formula HOLDS-EVERYWHERE-NEXT
          | formula HOLDS-ALWAYS
          | formula HOLDS-EVERYWHERE-ALWAYS
          | WHEN formula HOLDS-IN-NODE node-name-list
          | IN-NODE node-name-list WITH-TRANSITION transition-name
               formula HOLDS-EVENTUALLY

The first four cases (NOT, OR, AND, IMPLIES) are classic boolean operators with the usual semantics. The others are temporal operators describing how the truth-value of a formula is evaluated in a tree. Let's consider case by case.

Formula Semantic meaning
F1 HOLDS-UNTIL F2 from the current node, there exists a path where F1 holds at every node until F2 becomes true

An example is depicted in the following tree. When F1 or F2 hold in a node this is indicated between square brackets. The formula F1 HOLDS-UNTIL F2 holds in the green nodes.


Formula Semantic meaning
F HOLDS-EVENTUALLY from the current node there exists a path where at some point F becomes true

In the picture below, as F holds in n10, then F HOLDS-EVENTUALLY holds in the green nodes n1, n7, n10. This is because from these nodes there is a path reaching n10 where F holds. Note that it holds for n10 as well because there exists a trivial path of length 0 from n1 to itself.


Formula Semantic meaning
F HOLDS-EVERYWHERE-EVENTUALLY in every path starting from the current node at some point F becomes true

For example, in the tree below, the formula holds in every green node because every paths starting from each of them eventually reaches a node where F holds.


Formula Semantic meaning
F HOLDS-NEXT from the current node (we are visiting) there exists a child where F is true

In the tree below, the formula F HOLDS-NEXT it is true only in n1 as it's the only node with a child where F holds (node n3). In AL, NEXT is synonym of child as, in terms of a path in the tree, a child is the next node.


Formula Semantic meaning
F HOLDS-EVERYWHERE-NEXT from the current node in every existing child F is true

In the tree below, the formula F HOLDS-EVERYWHERE-NEXT it is true in n1 as it's the only node for which in every child F holds (node n2, n3, and n7).


Formula Semantic meaning
F HOLDS-ALWAYS from the current node there exists a path where F holds at every node

In the tree below F HOLDS-ALWAYS holds in n1, n2, n8 because for each of these nodes there exists a path where F holds at each node in the path.


Formula Semantic meaning
F HOLDS-EVERYWHERE-ALWAYS from the current node, in every path F holds at every node

F HOLDS-EVERYWHERE-ALWAYS holds in n2, n4, n5, and n8 because when we visit those nodes in every path that start from them F holds in every node.


Formula Semantic meaning
WHEN F HOLDS-IN-NODE node1,…,nodeK we are in a node among node1,…,nodeK and F holds

WHEN F HOLDS-IN-NODE n2, n7, n6 holds only in node n2 as it is the only node in the list n2, n7, n6 where F holds.

Let's consider an example of checker using formula WHEN F HOLDS-IN-NODE node1,…,nodeK for checking that a property with pointer type should not be declared "assign":

DEFINE-CHECKER ASSIGN_POINTER_WARNING = {

      SET report_when =
          WHEN
            is_assign_property() AND is_property_pointer_type()
          HOLDS-IN-NODE ObjCPropertyDecl;

      SET message = "Property `%decl_name%` is a pointer type marked with the `assign` attribute";
      SET suggestion = "Use a different attribute like `strong` or `weak`.";
      SET severity = "WARNING";
  };

The checker uses two predefined predicates is_assign_property() and is_property_pointer_type() which are true if the property being declared is assign and has a pointer type respectively. We want to check both conditions only on nodes declaring properties, i.e., ObjCPropertyDecl.


Formula Semantic meaning
IN-NODE node1,…, nodeK WITH-TRANSITION t F HOLDS-EVENTUALLY from the current node there exists a path which eventually reaches a node among “node1,…,nodeK” with a transition t reaching a child where F holds

The following tree explain the concept:

The concept of transition is needed because of the special structure of the clang AST. Certain kind of nodes, for example statements, have a list of children that are statements as well. In this case there is no special tag attached to the edge between the node and the children. Other nodes have records, where some of the fields point to other nodes. For example a node representing a function declaration will have a record where one of the fields is body. This is pointing to a statement representing the function's body. For records, sometimes we need to specify that we need a particular node reachable via a particular field (i.e., a transition).

Hint A good way to learn how to write checkers is looking at existing checkers in the file linters.al.

Example checks

In the following we show a few examples of simple checks you may wish to write and the corresponding formulas:

  • A check for flagging a Objective-C class that inherits from a class that shouldn't be subclassed.
DEFINE-CHECKER SUBCLASSING_TEST_EXAMPLE = {
   SET report_when = is_class("A") HOLDS-IN-SOME-SUPERCLASS-OF ObjCInterfaceDecl;
   SET message = "This is subclassing A. Class A should not be subclassed.";
};
  • A check for flagging an Objective-C instance method call:
DEFINE-CHECKER CALL_INSTANCE_METHOD = {
  SET report_when = call_instance_method("A", "foo:");
  SET message = "Do not call this method";
};
  • A check for flagging an Objective-C instance method call of any method of a class:
DEFINE-CHECKER CALL_ANY_INSTANCE_METHODS = {
  SET report_when = call_instance_method(A, REGEXP("*"));
  SET message = "Do not call any method of class A";
};
  • A check for flagging an Objective-C class method call:
DEFINE-CHECKER CALL_CLASS_METHOD = {
  SET report_when = call_class_method("A", "foo:");
  SET message = "Do not call this method";
};
  • A check for flagging an Objective-C method call of a method with int return type:
DEFINE-CHECKER TEST_RETURN_METHOD = {
  SET report_when = WHEN method_return_type("int")
        HOLDS-IN-NODE ObjCMethodDecl;
  SET message = "Method return int";
};
  • A check for flagging a variable declaration with type long
DEFINE-CHECKER TEST_VAR_TYPE_CHECK = {
  SET report_when = WHEN has_type("long")
	    HOLDS-IN-NODE VarDecl;
  SET message = "Var %name% has type long";
};
  • A check for flagging a method that has a parameter of type A*
DEFINE-CHECKER TEST_PARAM_TYPE_CHECK = {
  LET method_has_a_parameter_with_type(x) =
        WHEN HOLDS-NEXT WITH-TRANSITION Parameters (has_type(x))
        HOLDS-IN-NODE ObjCMethodDecl;
  SET report_when =
           method_has_a_parameter_with_type("A*" );
  SET message = "Found a method with a parameter of type A";
};
  • A check for flagging a method that has all the parameters of type A* (and at least one)
DEFINE-CHECKER TEST_PARAM_TYPE_CHECK2 = {
  LET method_has_at_least_a_parameter =
      WHEN HOLDS-NEXT WITH-TRANSITION Parameters (TRUE)
       HOLDS-IN-NODE ObjCMethodDecl;

   LET method_has_all_parameter_with_type(x) =
        WHEN HOLDS-EVERYWHERE-NEXT WITH-TRANSITION Parameters (has_type(x))
        HOLDS-IN-NODE ObjCMethodDecl;

  SET report_when = method_has_at_least_a_parameter AND
                    method_has_all_parameter_with_type("int");
  SET message = "All the parameters of the method have type int";
};
  • A check for flagging a method that has the 2nd parameter of type A*
DEFINE-CHECKER TEST_NTH_PARAM_TYPE_CHECK = {
  SET report_when =
    WHEN objc_method_has_nth_parameter_of_type("2", "A*")
    HOLDS-IN-NODE ObjCMethodDecl;
  SET message = "Found a method with the 2nd parameter of type A*";
  SET severity = "LIKE";
};
  • A check for flagging a protocol that inherits from a given protocol. HOLDS-EVENTUALLY WITH-TRANSITION Protocol means follow the Protocol branch in the AST until the condition holds.
DEFINE-CHECKER TEST_PROTOCOL_DEF_INHERITANCE = {
  LET is_subprotocol_of(x) = declaration_has_name(x) HOLDS-EVENTUALLY WITH-TRANSITION Protocol;
  SET report_when =
    WHEN is_subprotocol_of("P")
    HOLDS-IN-NODE ObjCProtocolDecl;
  SET message = "Do not inherit from Protocol P";
};
  • A check for flagging when a constructor is defined with a parameter of a type that implements a given protocol (or that inherits from it). HOLDS-NEXT WITH-TRANSITION Parameters means, starting in the ObjCMethodDecl node, follow the Parameters branch in the AST and check that the condition holds there.
DEFINE-CHECKER TEST_PROTOCOL_TYPE_INHERITANCE = {

  LET method_has_parameter_subprotocol_of(x) =
            WHEN
             HOLDS-NEXT WITH-TRANSITION Parameters
                (has_type_subprotocol_of(x))
             HOLDS-IN-NODE ObjCMethodDecl;

  SET report_when =
      WHEN
        declaration_has_name(REGEXP("^newWith.*:$")) AND
        method_has_parameter_subprotocol_of("P")
      HOLDS-IN-NODE ObjCMethodDecl;

  SET message = "Do not define parameters of type P.";
};
  • A check for flagging a variable declaration of type NSArray applied to A.
DEFINE-CHECKER TEST_GENERICS_TYPE = {
  SET report_when =
      WHEN has_type("NSArray<A>*")
      HOLDS-IN-NODE VarDecl;
  SET message = "Do not create arrays of type A";
};
  • A check for flagging using a property or variable that is not available in the supported API. decl_unavailable_in_supported_ios_sdk is a predicate that works on a declaration, checks the available attribute from the declaration and compares it with the supported iOS SDK. Notice that we flag the occurrence of the variable or property, but the attribute is in the declaration, so we need the transition PointerToDecl that follows the pointer from the usage to the declaration.
DEFINE-CHECKER UNAVAILABLE_API_IN_SUPPORTED_IOS_SDK = {
  SET report_when =
	WHEN HOLDS-NEXT WITH-TRANSITION PointerToDecl
	 (decl_unavailable_in_supported_ios_sdk() AND
	HOLDS-IN-NODE DeclRefExpr;
  SET message = "%name% is not available in the required iOS SDK version";
};
  • A check for flagging using a given namespace
DEFINE-CHECKER TEST_USING_NAMESPACE = {
  SET report_when = using_namespace("N");
  SET message = "Do not use namespace N";
};
  • A check for flagging the use of given enum constants
DEFINE-CHECKER ENUM_CONSTANTS = {
  SET report_when = is_enum_constant(REGEXP("MyName.*"));
  SET message = "Do not use the enum MyName";
};

AST info in messages

When you write the message of your rule, you may want to specify which particular AST items were involved in the issue, such as a type or a variable name. We have a mechanism for that, we specified a few placeholders that can be used in rules with the syntax %placeholder% and it will be substituted by the correct AST info. At the moment we have %type%, %child_type% and %name% that print the type of the node, the type of the node's child, and a string representation of the node, respectively. As with predicates, we can add more as needed.

Testing your rule

To test your rule you need to run it with Infer. If you are adding a new linter you can test it in a separate al file that you can pass to Infer with the option --linters-def-file file.al. Pass the option --linters-developer-mode --linter <LINTER_NAME> to Infer to print debug information and only run the linter you are developing, so it will be faster and the debug info will be only about your linter.

To test your code, write a small example that triggers the rule. Then, run your code with

infer --linters-developer-mode --linters-def-file file.al -- clang -c Test.m

the bug should be printed in the screen, like, for instance:

infer/tests/codetoanalyze/objcpp/linters/global-var/B.mm:34: warning: GLOBAL_VARIABLE_INITIALIZED_WITH_FUNCTION_OR_METHOD_CALL
  Global variable kLineSize is initialized using a function or method call at line 34, column 1. If the function/method call is expensive,
  it can affect the starting time of the app.
  32.   static float kPadding = [A bar] ? 10.0 : 11.0; // Error
  33.
  34. > static const float kLineSize = 1 / [A scale]; // Error
  35.
  36.   static const float ok = 37;
  37.

Moreover, the bug can be found in the file infer-out/report.json where infer-out is the results directory where Infer operates, that is created in the current directory. You can specify a different directory with the option -o.

Debugging

If there are syntax errors or other parsing errors with your al file, you will get an error message when testing the rule, remember to use linters-developer-mode when you are developing a rule. If the rule gets parsed but still doesn't behave as you expect, you can debug it, by adding the following line to a test source file in the line where you want to debug the rule: //INFER_BREAKPOINT. Then run infer again in linters developer mode, and it will stop the execution of the linter on the line of the breakpoint. Then you can follow the execution step by step. It shows the current formula that is being evaluated, and the current part of the AST that is being checked. A red node means that the formula failed, a green node means that it succeeded.

Demo

Command line options for linters

The linters are run by default when you run Infer. However, there is a way of running only the linters, which is faster than also running Infer. This is by adding the option --linters to the analysis command as in this example:

infer run --linters -- clang -c Test.m

There are a few other command-line options that are useful for using or developing new linters in Infer. Read about them in the infer capture manual.

List of Issue Types

The following issue types are reported by this checker: