LLFI: LLVM-based Fault Injection Tool

About This Tutorial

This tutorial targets to provide a consise instruction on using LLFI to perform fault injection experiments. Also, it is used in UIowa:CS:4980:0002, Topics in Computer Science II: Dependable System Design instructed by Dr. Guanpeng Li). The main contents of this tutorial are shown as below:

  • How to install LLFI?

  • What is LLFI? Why we use it? What is its basic usage?

  • How to use LLFI for analyzing program resilience?

The video of this document can be found at [1].

LLFI Quick Installation

Before you install LLFI, make sure your environment has python3 and its package pyyaml. For pyyaml, version 4x is okay and 4.2b1 is highly recommended. Other dependencies can be found at [2].

Following commands show how to install LLFI on a Ubuntu 16.04 machine.
$ git clone https://github.com/DependableSystemsLab/LLFI.git                           # download LLFI repository
$ cd LLFI/installer                                                                    # change directory to quick installation folder
$ python3 InstallLLFI.py --noGUI                                                       # quickly install command-line-version LLFI with python driver

python3 InstallLLFI.py --noGUI contains building LLVM, so it might be slow. If you want to speed up this process, please replace InstallLLFI.py:line 328 with p = subprocess.call(["make", "-j16"]). “16” indicates the maximum thread number per user in your OS. You can use command nproc to check it out.

Setting up local environment for LLFI
$ echo "export PATH=YOUR-LOCAL-PATH/LLFI/installer/llfi/bin:\$PATH" >> ~/.bashrc       # adding LLFI executable binary path variables to local environment
$ source ~/.bashrc                                                                     # activate environment
Checking if installation is successful.
$ cd $YOUR-LOCAL-PATH$/LLFI/installer/llfi/bin && ls                                   # list all compiled LLFI executable binaries
batchInjectfault  batchProfile  cmake_install.cmake      injectfault       instrument  Makefile  SoftwareFailureAutoScan
batchInstrument   CMakeFiles    HardwareFailureAutoScan  InjectorAutoScan  llfi-gui    profile
$ # It indicates LLFI is successfully installed to your machine :)

LLFI

What is LLFI?

LLFI (Low-Level Fault Injector) is an LLVM based fault injection tool, which injects faults into the LLVM IR (human-readable one) of a specified program. LLFI is mainly implemented by a collection of LLVM 3.4 passes and driven by python script. The fault injection (FI) campaign of LLFI is defined by input.yaml and very flexible.

Why we use LLFI?

We have already known FI is commonly used for testing program resilience against soft errors. But why we use LLFI? (or why we injection faults on LLVM IR) There are several reasons:

  • Hardware FI is hard to pinpoint a certain location (imagine you want to flip a certain bit in a hardware device >_<). The analysis is also very hard in hardware. Beam test is even radio active.

  • Architecture FI can be conducted by simulator, but it is very slow. Some architecture designs (such as Intel) are not publicly available, making it hard to develop simulator.

  • Thus, application-level FI is flexible and easy to develop and post-analyze. Besides, Wei et al [3] have demonstrated FIs on binary (PINFI) and LLVM (LLFI) can produce similar results.

  • In LLVM-level FIs, because of its Optimizer and standard IR, we can easily perform a lot of code transformations and program analysis by developing LLVM Passes. Here tracking error propagation is also possible.

LLFI Basic Usage

There are three major steps to use LLFI. and we use pathfinder [3] for example. Note that these commands can only be executed with input.yaml under the same path with compiled pathfinder.ll. More information of input.yaml format can be checked at the [4].

1. Instrumentation: instrumenting functional code in given program IR
$ # command example
$ # note that you should have LLFI configure file called input.yaml in the same path before you run this command.
$ # details can be found in LLFI Usage section.
$ instrument --readable pathfinder.ll
2. Profile: a fault-free program execution
$ # command example
$ # note that this command should be executed after "instrument"
$ profile ./llfi/pathfinder-profiling.exe 1000 10                                      # "1000 10" is the program input of pathfinder
3. Fault Injection: program executions with faults
$ # command example
$ # note that this command should be executed after "profile"
$ injectfault ./llfi/pathfinder-faultinjection.exe 1000 10                             # "1000 10" is the program input of pathfinder

Making sure the program input for Profile phase is the same as Fault Injection phase. In this way we compare the results are meaningful.

Analyze Program Resilience with LLFI

We refer instruction in this section as LLVM IR instructions since LLFI injects faults to LLVM IR level.

Preprocess and Compile

To analyze the program resilience, we make to make sure the program results are deterministic under each input. So that comparing the results between fault-free (i.e. profile phase) and fault-injected program executions is meaningful. In order to make the results deterministic, we have to remove all randomenss in the raw code, such as CPU time and random numbers/arrays.

After preprocessing the codes, we can compile the program source code into a readable LLVM IR (what is LLVM IR? [5]). In studying program resilience against soft errors, the programs we evaluate are always implemented via C/C++. Here we provide an example to show how to compile a C project into a readable LLVM IR. The code can be found at [6]. Specifically, its compile commands can be found as below:

$                                                                                      # Compile commands:
$ clang++ -S -emit-llvm *.cpp                                                          #     Compile all .cpp file into .ll
$ llvm-link -S -o hpccg.ll *.ll                                                        #     Link all .ll into a single IR
                                                                                       # Optional:
$ clang++ hpccg.ll -o hpccg                                                            #     Compile to executable binary
$ ./hpccg 64 64 64                                                                     #     check if its execution is normal

You can follow this pattern to preprocess and compile other C/C++ projects before evaluating resilience.

Per-Instruction FI

We provide an example of performing per-instruction FI experiments via LLFI. The code of this example can be found at link [7]. Note that the “instruction” here we mentioned are LLVM IR instruction.

1. Download and Prepare
$ git clone https://github.com/hyfshishen/LLFI-Quick-Start.git                         # download code from github
$ cd perInstFI                                                                         # change directory to target folder
$ clang -emit-llvm -S *.c                                                              # compile demo C code to readable LLVM IR
2. Check input.yaml
defaultTimeOut: 500                                                                    # if program with injected fault execute more than 500 seconds
                                                                                       #     it will be terminated and recorded as "hang".
compileOption:
    instSelMethod:                                                                     # how we select instruction to inject fault in each program execution
      - customInstselector:
          include:
            - llfiindex
          options:
            - -injecttoindex=77                                                        # we inject faults to 77th instruction (indexed by llfiindex)

    regSelMethod: regloc
    regloc: dstreg

runOption:
    - run:
        numOfRuns: 100                                                                 # 100 program executions with injected fault (1 fault for 1 program execution)
        fi_type: bitflip                                                               # fault type is bit-flip (i.e. randomly flip one bit of target instruction's return value)
3. Execute LLFI Commands (Instrument-Profile-InjectFault)
$ instrument --readable sqrt.ll
$ profile ./llfi/sqrt-profiling.exe                                                    # no input augments in current program sqrt
$ injectfault ./llfi/sqrt-faultinjection.exe                                           # no input augments in current program sqrt

Random FI

We provide an example of performing random FI experiments via LLFI. The code of this example can be found at link [7]. Note that the “instruction” here we mentioned are LLVM IR instruction.

1. Download and Prepare
$ git clone https://github.com/hyfshishen/LLFI-Quick-Start.git                         # download code from github
$ cd randomFI                                                                          # change directory to target folder
$ clang -emit-llvm -S *.c                                                              # compile demo C code to readable LLVM IR
2. Check input.yaml
defaultTimeOut: 500                                                                    # if program with injected fault execute more than 500 seconds
                                                                                       #     it will be terminated and recorded as "hang".
compileOption:
    instSelMethod:                                                                     # how we select instruction to inject fault in each program execution
      - insttype:
          include:
            - all                                                                      # the fault may occur at any instructions in current program
          exclude:                                                                     # also, we don't inject fault to instructions with type:
            - ret                                                                      #     ret
            - alloca                                                                   #     alloca
            - call                                                                     #     call
            - phi                                                                      #     phi

    regSelMethod: regloc
    regloc: dstreg

runOption:
    - run:
        numOfRuns: 100                                                                 # 100 program executions with injected fault (1 fault for 1 program execution)
        fi_type: bitflip                                                               # fault type is bit-flip (i.e. randomly flip one bit of target instruction's return value)

The reasons we routinely exclude “ret”, “alloca”, “call”, and “phi” instructions during random FI are listed as below:

  • Those instructions do not have return values. In our commonly used fault model, we only inject fault to return value of an instruction.

  • Some instructions appear only one time (or several times) for one program execution, such as “alloca” allocates value to target parameter.

  • Some instructions are defined only at LLVM IR instruction level, but not in assembly code, such as “phi”.

  • Some instructions will occupy too much space of main memory and will lead to system failure, such as “alloca”.

3. Execute LLFI Commands (Instrument-Profile-InjectFault)
$ instrument --readable sqrt.ll
$ profile ./llfi/sqrt-profiling.exe                                                    # no input augments in current program sqrt
$ injectfault ./llfi/sqrt-faultinjection.exe                                           # no input augments in current program sqrt

Post-analysis

The output structure of an IR that is injected with faults via LLFI can be checked as below. We still use sqrt.ll here for example. Before talking this, we have to make sure these following concepts are clear:

  • Numerical Program Input: the input of a program is numerical and can be executed via command line (like “./hello-world -i 100”).

  • File Program Input: the input of a program is a file (like “./hello-world -i ../yafan.txt”).

  • Standard Program Output: program outputs that are printed in the screen (like printf(“hello world!”) in C).

  • File Program Output: program outputs that are written into a file (routinely via file I/O stream).

# More details about the following content can be checked at LLFI wiki [4].
┌── sqrt.c                                  # Original C source code of target program.
├── sqrt.ll                                 # Readable IR compiled from C source code. This is also the input of instrumentation phase.
├── input.yaml                              # Defines what fault to inject, where/when to inject fault, and how many faults we inject in total.
├── llfi.config.compiletime.txt             # Output of LLFI's llvm passes. We typically DON'T CHECK THIS in FI experiments.
├── llfi.config.runtime.txt                 # Run-time option of LLFI.      We typically DON'T CHECK THIS in FI experiments.
├── llfi.log.compilation.txt                # Compilation log with LLFI.    We typically DON'T CHECK THIS in FI experiments.
├── llfi.stat.prof.txt                      # The total number of dynamic instruction counts of potential injection points.#     If we perform random FI, that will be dynamic instruction count of the whole program.#     If we perform per-instruction FI, that will be dynamic instruction count of the target instruction.
├── llfi.stat.totalindex.txt                # The total number of static instructions labeled as candidates of fault injection.
└── llfi
    ├── baseline           # ->  This folder contains results (both standard output and file output) generated from profile phase (fault-free program execution).
    ├── error_output       # ->  This folder contains error information (if program is "hang" or "crash") of each fault-injected program execution.
    ├── llfi_stat_output   # ->  This folder contains detaled injected fault information (e.g. instruction type where fault is injected) of each fault-injected program execution.
    ├── prog_input         # ->  If program has file inputs, they will be backed up in this folder.
    ├── prog_output        # ->  This folder contains file output for each fault-injected program execution.
    ├── std_output         # ->  This folder contains standard output for each fault-injected program execution.
    ├── sqrt-llfi_index.ll                  # All its function is the same as sqrt.ll and each instruction in it is labeled with an LLFI index.
    ├── sqrt-profiling.ll                   # Readable IR for profiling phase.
    ├── sqrt-profiling.exe                  # Executable binary for profiling phase (compiled from above .ll).
    ├── sqrt-faultinjection.ll              # Readable IR for fault-injection phase.
    └── sqrt-faultinjection.exe             # Executable binary for fault-injection phase (compiled from above .ll).

Sometimes, program will have both file output and standard output. We have to compare the meaningful outputs between profile and fault-injection phases. Imagine there is one sorting program called bubble-sort.c, its outputs after one execution can be checked as below:

  • File output: the sorting results can be checked at result.txt.

  • Standard output: “Finish sorting!” will be printed through commandline.

Clearly, to analyze the program's resilience, we have to choose the main results that are more closely related to its functionality. So we choose to compare file output in that scenario. Note that this is very important for post-analysis after experiments with LLFI. Otherwise, we may bring totally useless resilience analysis results.

There are 4 types of results after comparing the outputs:

  • SDC: Silent data corruption (SDC) means program exits normally but produces wrong results.

  • Benign: Program exits normally and produces correct results.

  • Crash: Program exits abnormally with exceptions.

  • Hang: Program halts and executes longer than defaultTimeOut set in input.yaml.

LLFI provides an official Python3 script to measure those results, which can be found at: [8].

Contributing

This document is written by Yafan Huang. Besides, this document has not been iterated yet, so some descriptions may be a bit confused. If you have any questions after reading this document, please feel free to email him at yafan-huang@uiowa.edu.

References

[1] Yafan Huang: LLFI Tutorial Video: [Link]
[2] Github Repo: LLFI: [Link]
[3] DSN-14: Quantifying the accuracy of High-Level Fault injection Techniques for Hardware Faults: [Link]
[4] LLFI WIKI: [Link]
[5] Yafan Huang: LLVM Tutorial: [Link]
[6] Github Repo: LLFI-Complicated-Project: [Link]
[7] Github Repo: LLFI-Quick-Start: [Link]
[8] LLFI Result Measurement Script: [Link]