LLFI: LLVM-based Fault Injection Tool

This tutorial targets to provide a consise instruction on using LLFI to perform fault injection experiments. Also, it is used in UIowa:CS:4980:0002, Topics in Computer Science II: Dependable System Design instructed by Dr. Guanpeng Li). The main contents of this tutorial are shown as below:

LLFI Quick Installation

Before you install LLFI, make sure your environment has python3 and its package pyyaml. For pyyaml, version 4x is okay and 4.2b1 is highly recommended. Other dependencies can be found at [2].

Following commands show how to install LLFI on a Ubuntu 16.04 machine.

$ git clone https://github.com/DependableSystemsLab/LLFI.git                           # download LLFI repository
$ cd LLFI/installer                                                                    # change directory to quick installation folder
$ python3 InstallLLFI.py --noGUI                                                       # quickly install command-line-version LLFI with python driver

python3 InstallLLFI.py --noGUI contains building LLVM, so it might be slow. If you want to speed up this process, please replace InstallLLFI.py:line 328 with p = subprocess.call(["make", "-j16"]). “16” indicates the maximum thread number per user in your OS. You can use command nproc to check it out.

Setting up local environment for LLFI

$ echo "export PATH=YOUR-LOCAL-PATH/LLFI/installer/llfi/bin:\$PATH" >> ~/.bashrc       # adding LLFI executable binary path variables to local environment
$ source ~/.bashrc                                                                     # activate environment

Checking if installation is successful.

$ cd $YOUR-LOCAL-PATH$/LLFI/installer/llfi/bin && ls                                   # list all compiled LLFI executable binaries
batchInjectfault  batchProfile  cmake_install.cmake      injectfault       instrument  Makefile  SoftwareFailureAutoScan
batchInstrument   CMakeFiles    HardwareFailureAutoScan  InjectorAutoScan  llfi-gui    profile
$ # It indicates LLFI is successfully installed to your machine :)

LLFI

What is LLFI?

LLFI (Low-Level Fault Injector) is an LLVM based fault injection tool, which injects faults into the LLVM IR (human-readable one) of a specified program. LLFI is mainly implemented by a collection of LLVM 3.4 passes and driven by python script. The fault injection (FI) campaign of LLFI is defined by input.yaml and very flexible.

Why we use LLFI?

We have already known FI is commonly used for testing program resilience against soft errors. But why we use LLFI? (or why we injection faults on LLVM IR) There are several reasons:

LLFI Basic Usage

There are three major steps to use LLFI. and we use pathfinder [3] for example. Note that these commands can only be executed with input.yaml under the same path with compiled pathfinder.ll. More information of input.yaml format can be checked at the [4].

1. Instrumentation: instrumenting functional code in given program IR

$ # command example
$ # note that you should have LLFI configure file called input.yaml in the same path before you run this command.
$ # details can be found in LLFI Usage section.
$ instrument --readable pathfinder.ll

2. Profile: a fault-free program execution

$ # command example
$ # note that this command should be executed after "instrument"
$ profile ./llfi/pathfinder-profiling.exe 1000 10                                      # "1000 10" is the program input of pathfinder

3. Fault Injection: program executions with faults

$ # command example
$ # note that this command should be executed after "profile"
$ injectfault ./llfi/pathfinder-faultinjection.exe 1000 10                             # "1000 10" is the program input of pathfinder

Making sure the program input for Profile phase is the same as Fault Injection phase. In this way we compare the results are meaningful.

Analyze Program Resilience with LLFI

We refer instruction in this section as LLVM IR instructions since LLFI injects faults to LLVM IR level.

Preprocess and Compile

To analyze the program resilience, we make to make sure the program results are deterministic under each input. So that comparing the results between fault-free (i.e. profile phase) and fault-injected program executions is meaningful. In order to make the results deterministic, we have to remove all randomenss in the raw code, such as CPU time and random numbers/arrays.

After preprocessing the codes, we can compile the program source code into a readable LLVM IR (what is LLVM IR? [5]). In studying program resilience against soft errors, the programs we evaluate are always implemented via C/C++. Here we provide an example to show how to compile a C project into a readable LLVM IR. The code can be found at [6]. Specifically, its compile commands can be found as below:

$                                                                                      # Compile commands:
$ clang++ -S -emit-llvm *.cpp                                                          #     Compile all .cpp file into .ll
$ llvm-link -S -o hpccg.ll *.ll                                                        #     Link all .ll into a single IR
                                                                                       # Optional:
$ clang++ hpccg.ll -o hpccg                                                            #     Compile to executable binary
$ ./hpccg 64 64 64                                                                     #     check if its execution is normal

You can follow this pattern to preprocess and compile other C/C++ projects before evaluating resilience.

Per-Instruction FI

We provide an example of performing per-instruction FI experiments via LLFI. The code of this example can be found at link [7]. Note that the “instruction” here we mentioned are LLVM IR instruction.

1. Download and Prepare

$ git clone https://github.com/hyfshishen/LLFI-Quick-Start.git                         # download code from github
$ cd perInstFI                                                                         # change directory to target folder
$ clang -emit-llvm -S *.c                                                              # compile demo C code to readable LLVM IR

2. Check input.yaml

defaultTimeOut: 500                                                                    # if program with injected fault execute more than 500 seconds
                                                                                       #     it will be terminated and recorded as "hang".
compileOption:
    instSelMethod:                                                                     # how we select instruction to inject fault in each program execution
      - customInstselector:
          include:
            - llfiindex
          options:
            - -injecttoindex=77                                                        # we inject faults to 77th instruction (indexed by llfiindex)

    regSelMethod: regloc
    regloc: dstreg

runOption:
    - run:
        numOfRuns: 100                                                                 # 100 program executions with injected fault (1 fault for 1 program execution)
        fi_type: bitflip                                                               # fault type is bit-flip (i.e. randomly flip one bit of target instruction's return value)

3. Execute LLFI Commands (Instrument-Profile-InjectFault)

$ instrument --readable sqrt.ll
$ profile ./llfi/sqrt-profiling.exe                                                    # no input augments in current program sqrt
$ injectfault ./llfi/sqrt-faultinjection.exe                                           # no input augments in current program sqrt

Random FI

We provide an example of performing random FI experiments via LLFI. The code of this example can be found at link [7]. Note that the “instruction” here we mentioned are LLVM IR instruction.

1. Download and Prepare

$ git clone https://github.com/hyfshishen/LLFI-Quick-Start.git                         # download code from github
$ cd randomFI                                                                          # change directory to target folder
$ clang -emit-llvm -S *.c                                                              # compile demo C code to readable LLVM IR

2. Check input.yaml

defaultTimeOut: 500                                                                    # if program with injected fault execute more than 500 seconds
                                                                                       #     it will be terminated and recorded as "hang".
compileOption:
    instSelMethod:                                                                     # how we select instruction to inject fault in each program execution
      - insttype:
          include:
            - all                                                                      # the fault may occur at any instructions in current program
          exclude:                                                                     # also, we don't inject fault to instructions with type:
            - ret                                                                      #     ret
            - alloca                                                                   #     alloca
            - call                                                                     #     call
            - phi                                                                      #     phi

    regSelMethod: regloc
    regloc: dstreg

runOption:
    - run:
        numOfRuns: 100                                                                 # 100 program executions with injected fault (1 fault for 1 program execution)
        fi_type: bitflip                                                               # fault type is bit-flip (i.e. randomly flip one bit of target instruction's return value)

The reasons we routinely exclude “ret”, “alloca”, “call”, and “phi” instructions during random FI are listed as below:

3. Execute LLFI Commands (Instrument-Profile-InjectFault)

$ instrument --readable sqrt.ll
$ profile ./llfi/sqrt-profiling.exe                                                    # no input augments in current program sqrt
$ injectfault ./llfi/sqrt-faultinjection.exe                                           # no input augments in current program sqrt

Post-analysis

The output structure of an IR that is injected with faults via LLFI can be checked as below. We still use sqrt.ll here for example. Before talking this, we have to make sure these following concepts are clear:

# More details about the following content can be checked at LLFI wiki [4].
┌── sqrt.c                                  # Original C source code of target program.
├── sqrt.ll                                 # Readable IR compiled from C source code. This is also the input of instrumentation phase.
├── input.yaml                              # Defines what fault to inject, where/when to inject fault, and how many faults we inject in total.
├── llfi.config.compiletime.txt             # Output of LLFI's llvm passes. We typically DON'T CHECK THIS in FI experiments.
├── llfi.config.runtime.txt                 # Run-time option of LLFI.      We typically DON'T CHECK THIS in FI experiments.
├── llfi.log.compilation.txt                # Compilation log with LLFI.    We typically DON'T CHECK THIS in FI experiments.
├── llfi.stat.prof.txt                      # The total number of dynamic instruction counts of potential injection points.
│                                           #     If we perform random FI, that will be dynamic instruction count of the whole program.
│                                           #     If we perform per-instruction FI, that will be dynamic instruction count of the target instruction.
├── llfi.stat.totalindex.txt                # The total number of static instructions labeled as candidates of fault injection.
└── llfi
    ├── baseline           # ->  This folder contains results (both standard output and file output) generated from profile phase (fault-free program execution).
    ├── error_output       # ->  This folder contains error information (if program is "hang" or "crash") of each fault-injected program execution.
    ├── llfi_stat_output   # ->  This folder contains detaled injected fault information (e.g. instruction type where fault is injected) of each fault-injected program execution.
    ├── prog_input         # ->  If program has file inputs, they will be backed up in this folder.
    ├── prog_output        # ->  This folder contains file output for each fault-injected program execution.
    ├── std_output         # ->  This folder contains standard output for each fault-injected program execution.
    ├── sqrt-llfi_index.ll                  # All its function is the same as sqrt.ll and each instruction in it is labeled with an LLFI index.
    ├── sqrt-profiling.ll                   # Readable IR for profiling phase.
    ├── sqrt-profiling.exe                  # Executable binary for profiling phase (compiled from above .ll).
    ├── sqrt-faultinjection.ll              # Readable IR for fault-injection phase.
    └── sqrt-faultinjection.exe             # Executable binary for fault-injection phase (compiled from above .ll).

Sometimes, program will have both file output and standard output. We have to compare the meaningful outputs between profile and fault-injection phases. Imagine there is one sorting program called bubble-sort.c, its outputs after one execution can be checked as below:

Clearly, to analyze the program's resilience, we have to choose the main results that are more closely related to its functionality. So we choose to compare file output in that scenario. Note that this is very important for post-analysis after experiments with LLFI. Otherwise, we may bring totally useless resilience analysis results.

LLFI provides an official Python3 script to measure those results, which can be found at: [8].

Contributing

This document is written by Yafan Huang. Besides, this document has not been iterated yet, so some descriptions may be a bit confused. If you have any questions after reading this document, please feel free to email him at yafan-huang@uiowa.edu.

References

[1] Yafan Huang: LLFI Tutorial Video: [Link]
[2] Github Repo: LLFI: [Link]
[3] DSN-14: Quantifying the accuracy of High-Level Fault injection Techniques for Hardware Faults: [Link]
[4] LLFI WIKI: [Link]
[5] Yafan Huang: LLVM Tutorial: [Link]
[6] Github Repo: LLFI-Complicated-Project: [Link]
[7] Github Repo: LLFI-Quick-Start: [Link]
[8] LLFI Result Measurement Script: [Link]