Full Evaluation

Launch the docker container:

$ cd /path/to/the/artifact/
$ ./start-container.py

Note

All the following code are executed in the docker container.

Warning

The full evaluation can take upto 20 hours on a 32-core machine. It is thus recommended to open a tmux session to start it in the background and come back when the experiments are finished.

Create a tmux session.

$ tmux new -s eval   # create a tmux session.

To leave the job running in the background:

ctr + b
d

To resume the session:

$ tmux at -t eval   # resume a tmux session.

Bug finding (Section 4.2)

In the paper, we claimed that “Overall, we reported 31 bugs…” and we showed bug details in Table 3. You can find the evidence in our Bug Report Table

We also include all bug-triggering testcases in /artifact/eval-bugs/. You can see the list of testcases by running

$ tree /artifact/eval-bugs/

Figures 7~10 are mainly obtained via manually analysis for each bug report.

UB Program Generator (Section 4.3)

We used 1,000 seed programs for the experiments. All seed programs are in /artifact/eval-generator/seeds/.

TL;DR: To generate UB progams with UBFuzz, MUSIC, and Csmith-NoSafe at the same time, execute

$ cd /artifact/eval-generator/
$ ./generate_ub_all.py --cpu 32

Note

With 32 cores (--cpu 32), the script takes roughly 7 hours to finish.

This command will use 32 CPU cores to (1) generate all supported UB programs from the 1,000 seed programs, and (2) analyze each of them with our Crash-Site Mapping test oracle. The analysis meta-data are saved in ./UBFuzz/log_mutate.txt, ./MUSIC/log_mutate.txt, and ./Csmith-NoSafe/log_mutate.txt. These data are not directly readable but will later be used for analysis. Use a different --cpu number to specify the number of CPU cores.

The generated UB programs will be in ./UBFuzz/mutants/, ./MUSIC/mutants/, and ./Csmith-NoSafe/mutants/.

To get Table 4, execute

$ ./analyze_all.py

This script will report the statistics of UB programs generated by each approach.

If you want to generate UB programs with one single tool such as UBFuzz, goto Generate UB programs.

Crash-Site Mapping (Section 4.4)

In the paper, we claimed that “Out of the 6,567 discrepancies, our crash-site mapping selected 58 and dropped the rest 6,505 as invalid.”

To verify this result, execute

$ ./UBFuzz/analyze_crash_site_mapping.py

This script will read the meta-data from the last section and print out the Crash-Site Mapping result.

Coverage (Section 4.5)

The code for analyzing the coverage in in /artifact/eval-cov/. To get the GCC and LLVM coverage of Seeds, MUSIC, Csmith-NoSafe, and UBFuzz, execute

$ cd /artifact/eval-cov/
$ ./collect_cov_all.py --cpu 32

Note

With 32 cores (--cpu 32), the script takes roughly 3 hours to finish.

This script will run UB programs from each approach and analyze the line, function, and branch coverage.

Congratulations! You have successfully finished all the main experiments of UBFuzz.

If you try out UBFuzz to generate new UB programs, goto UBGen