--- id: adding-models title: Adding models --- ## Why do we need models When analyzing projects with call dependencies between functions, Infer follows the call graph to decide in which order to analyze these functions. The main goal is to use the analysis summary of a function wherever this function is called. On the following example: ```c int foo(int x) { if (x < 42) { return x; } else { return 0; } } int bar() { return foo(24); } int baz() { return foo(54); } ``` Infer starts with the analysis on `foo` and detect that this function either returns `0` if the argument is greater than or equal to `42`, or returns the value of the argument otherwise. With this information, Infer detects that `bar` always returns `24` and `baz` always returns `0`. Now, it may happen that the code of some function is not available during the analysis. For example, this happens when a project uses pre-compiled libraries. The most typical case is the use of the standard library like in the following example: ```c #include int* create() { int *p = malloc(sizeof(int)); if (p == NULL) exit(1); return p; } void assign(int x, int *p) { *p = x; } int* my_function() { int *p = create(); assign(42, p); return p; } ``` Here, Infer will start with the analysis of `create` but will not find the source code for `malloc`. To deal with this situation, Infer relies on models of the missing functions to proceed with the analysis. The function `malloc` is internally modeled as either returning `NULL`, or returning a valid and allocated pointer. Similarly, the function `exit` is modeled as terminating the execution. Using these two models, Infer detects that `create` always returns an allocated pointer and that `my_function` is safe. At this point, it is important to note that missing source code and missing models do not make the analysis fail. Missing functions are treated as having no effect. However, even though skipping these missing functions is fine in most cases, there can be cases where it affects the quality of the analysis. For example, missing models can lead to incorrect bug reports. Consider the case of a function `lib_exit` having the same semantics as `exit` but defined in an pre-compiled library not part of the project being analyzed: ```c void lib_exit(int); int* create() { int *p = malloc(sizeof(int)); if (p == NULL) lib_exit(1); return p; } ``` In this case, Infer will not be able to know that the return statement is only possible in the case where `p` is not null. When analyzing `my_function`, Infer will consider the null case and report a null dereference error in the call to `assign(42, p)`. Similarly, considering a function `lib_alloc` equivalent to `malloc`, and the function `create` now defined as: ```c int* lib_alloc(int); int* create() { int *p = lib_alloc(sizeof(int)); return p; } ``` Then Infer will not report any null dereference in `my_function`. ## Examples of models ### Some models for C Adding new models is easy. The models for C can be found in [`infer/models/c/src/`](https://github.com/facebook/infer/tree/master/infer/models/c/src). The file [`libc_basic.c`](https://github.com/facebook/infer/blob/master/infer/models/c/src/libc_basic.c) contains models for some of the most commonly encountered functions from the C standard library. For example, the function `xmalloc`, which is essentially the same function as `create` defined above, is modeled by: ```c void *xmalloc(size_t size) { void *ret = malloc(size); INFER_EXCLUDE_CONDITION(ret == NULL); return ret; } ``` The function `xmalloc` is modeled using `malloc` to create an allocated object and the macro `INFER_EXCLUDE_CONDITION` used to eliminate the case where `malloc` can return null. The list of helper functions and macros for writing models can be found in [`infer_builtins.c`](https://github.com/facebook/infer/blob/master/infer/models/c/src/infer_builtins.c). For a slightly more complex example, `realloc` is modeled as: ```c void *realloc(void *ptr, size_t size) { if(ptr==0) { // if ptr in NULL, behave as malloc return malloc(size); } int old_size; int can_enlarge; old_size = __get_array_size(ptr); // force ptr to be an array can_enlarge = __infer_nondet_int(); // nondeterministically choose whether the current block can be enlarged if(can_enlarge) { __set_array_size(ptr, size); // enlarge the block return ptr; } int *newblock = malloc(size); if(newblock) { free(ptr); return newblock; } else { // if new allocation fails, do not free the old block return newblock; } } ``` This model is based on existing models for `malloc` and `free` and three helper functions: - `__get_array_size(ptr)` which allows to manipulate with a model what Infer knows about the size of the allocated memory - `__set_array_size(ptr, size)` to modify the information about the size of the allocated memory - `__infer_nondet_int()` to create a variable which can have any possible integer value ### For Java The models for Java are following the same approach and the list of helper functions is in: [`infer/models/java/src/com/facebook/infer/models/InferBuiltins.java`](https://github.com/facebook/infer/blob/master/infer/models/java/src/com/facebook/infer/models/InferBuiltins.java) [`infer/models/java/src/com/facebook/infer/models/InferUndefined.java`](https://github.com/facebook/infer/blob/master/infer/models/java/src/com/facebook/infer/models/InferUndefined.java) For example, Infer treats Java hash maps using a recency abstraction model: Infer remembers the last two keys being added by `put` and checked by `containsKey`, which can be used to make sure that no null pointer exceptions are coming from the fact that `get(key)` returns null when `key` is not in the map. This behavior can just be implemented via a model written in Java with the help of few helper functions understood by Infer. These models can be found in: [`infer/models/java/src/java/util/HashMap.java`](https://github.com/facebook/infer/blob/master/infer/models/java/src/java/util/HashMap.java) and just rely on these two methods: - `InferUndefined.boolean_undefined()` to create a non-deterministic choice - `(V)InferUndefined.object_undefined()` to create a non null undefined object of type `V` ## How to add new models Let's look at a toy example in Java. As explained above, models for C, Objective-C and Java are all following the same approach. ```java import lib.Server; public class Test { enum Status { SUCCESS, FAILURE, PING_FAILURE, CONNECTION_FAILURE } Status convertStatus(Server s) { switch (s.getStatus()) { case 0: return Status.SUCCESS; case 1: return Status.FAILURE; case 2: return Status.FAILURE; default: // should not happen return null; } } String statusName(Server s) { Status status = convertStatus(s); return status.name(); } } ``` Assuming that the class `lib.Server` is part of a pre-compiled library, Infer will report a null pointer exception in `statusName`. This happens whenever `s.getStatus()` returns a value greater that `3`, in which case the default branch of the switch statement is taken and `convertStatus` returns `null`. However, we know from the documentation that the method `lib.Server.getStatus` can only return `0`, `1`, or `2`. A possible approach would be to use an assertion like the Guava `Preconditions.checkState` to inform Infer about the invariant: ```java Status convertStatus(Server s) { int serverStatus = s.getStatus(); Preconditions.checkState(serverStatus >= 0 && serverStatus < 3); switch (s.getStatus()) { ... } } ``` However, in the case where adding preconditions is not possible, we can then write a model for `getStatus()` in order to make the analysis more precise. To create a model for `getStatus()`, we need to add a class with the name and the same package as for the original method. In this example: - create a file `infer/models/java/src/infer/models/Server.java` with the following content: ```java package infer.models; import com.facebook.infer.models.InferBuiltins; import com.facebook.infer.models.InferUndefined; public class Server { public int getStatus() { int status = InferUndefined.int_undefined(); InferBuiltins.assume(status >= 0 && status < 3); return status; } } ``` - recompile infer: ```bash make -C infer ``` - run the analysis again: ```bash infer run -- javac Test.java ``` Now it should no longer report a null pointer exception.