This tutorial provides a very basic and brief introduction about LLVM which is used in UIowa:CS:4980, Topics in Computer Science II: Dependable System Design instructed by Dr. Guanpeng Li. Specifically, this tutorial targets to provide concise descriptions of following contents:
Following commands show how to install LLVM 3.4 on a Ubuntu 16.04 machine.
$ # Note that LLVM source code of other versions can be found at https://github.com/llvm/llvm-project.
$ git clone https://github.com/zjuacompiler/llvm.git # Download LLVM 3.4 source code
$ cd llvm # Change directory to uncompressed folder
$ ./configure --enable-optimized --disable-assertions --enable-targets=host --with-python="/usr/bin/python2"
$ # Configure dependencies before you build LLVM
$ mkdir build && cd build # "build" folder is for release
$ cmake .. # Prepare cmake file, ".." indicates the path to source code
$ make -j$(nproc) # Make install LLVM with multi-threads
Check if installation is successful.
$ cd $YOUR-LOCAL-PATH$/llvm/build/bin/ && ls # list all compiled LLVM executable binaries
bugpoint FileUpdate lli-child-target llvm-bcanalyzer llvm-c-test llvm-dwarfdump llvm-lit llvm-mcmarkup llvm-ranlib llvm-size llvm-tblgen obj2yaml yaml-bench
count llc llvm-ar llvm-config llvm-diff llvm-extract llvm-lto llvm-nm llvm-readobj llvm-stress macho-dump opt
FileCheck lli llvm-as llvm-cov llvm-dis llvm-link llvm-mc llvm-objdump llvm-rtdyld llvm-symbolizer not yaml2obj
$ # It indicates LLVM-3.4 is successfully installed to your machine :)
Low-Level Virtual Machine (LLVM) actually is a compiler architecture [1], shown as above picture. The main use of LLVM is that it can be the Optimizer (language-agnostic Optimization) and Backend (Machine Code Generation) of multiple programming languages.
Compared with the traditional compiler (Frontend, Optimizer, Backend), the design of LLVM is very flexible for different programming languages. Supporting a new language only needs a new frontend. Supporting a new hardware device only needs a new backend. LLVM Optimizer is a general module — it provides a common standard and its optimization is based on well-structured LLVM IR (Intermediate Representation).
Besides, LLVM contains a lot of sub-projects including Clang (a lightweight compiler frontend for C, C, Objective-C, and Objective-C. You may frequently see it in your MacBooks)
LLVM IR is a low-level intermediate representation used by LLVM compiler framework [2], shown as above picture. LLVM-based compilers here can be splited into three components, front-end, middle-end, and back-end; each with a specific task that takes IR as input and/or produces IR as output.
You can think of LLVM IR as platform-independent assembly language with an infinite number of function local registers. However, LLVM IR is not machine code, but sort of the step just above assembly. So some things look more like a high-level language (like functions and the strong typing). Other looks more like low-level assembly (e.g. branching, basic-blocks, instructions).
LLVM IR has both human-readable .ll
and binary versions (.bc
, bitcode format).
These two formats can be transformed easily, and we mainly focus on .ll
here.
LLVM IR has three major components, function, basic-block (BB), and instruction.
Here is a toy example of compiling C code to LLVM IR.
This IR has only one main()
function.
This function has only one basic-block.
This basic-block has four instructions, of which types are load
, mul
, store
, and ret
.
Original C code.
int variable = 21;
int main()
{
variable = variable * 2;
return variable;
}
LLVM IR (readable format).
@variable = global i32 21 ; define global variable, in LLVM IR global variable starts with '@'
define i32 @main() {
%1 = load i32, i32* @variable ; load the global variable, in LLVM IR local variable starts with '%'
%2 = mul i32 %1, 2
store i32 %2, i32* @variable ; store instruction to write to global variable
ret i32 %2
}
To know more about LLVM IR grammar, please see this document [3].
LLVM Pass framework is an important component of LLVM infrastructure, and it performs code transformations and optimizations at LLVM IR level.
You may have already noticed that in the pictures above: LLVM Passes can be executed via the optimizer
.
In fact, you can conduct any transformations on a given LLVM IR code via LLVM Passes, even converting one IR (e.g. bubble sort) to a totally another one (e.g. quick sort) as long as you can implement the functions.
As a result, LLVM Pass is a very efficient tool for analyzing the program code.
Let’s start writing an LLVM Pass by conducting some simple program analysis: calculating the number of Call
type instructions in program IR.
STEP-1: Create LLVM Pass Folder
$ # We assume that you have already installed LLVM 3.4 via above guidance.
$ cd $YOUR-LOCAL-PATH$/llvm/lib/Transforms # change directory to the main folder that contains the LLVM Passes
$ ls # list files and folders in current path
CMakeLists.txt InstCombine IPO Makefile Scalar Vectorize
Hello Instrumentation LLVMBuild.txt ObjCARC Utils
$ # Except file CMakeList.txt, LLVMBuild.txt, and Makefile. Each of the rest folders denotes an individual LLVM Pass.
$ mkdir CallCount # create folder for our target LLVM Pass
Next, add path of folder CallCount into current CMakeLists.txt, so that it can be recognized during the LLVM compilation. The content of modified CMakeList.txt can be shown as below:
add_subdirectory(Utils)
add_subdirectory(Instrumentation)
add_subdirectory(InstCombine)
add_subdirectory(Scalar)
add_subdirectory(IPO)
add_subdirectory(Vectorize)
add_subdirectory(Hello)
add_subdirectory(ObjCARC)
add_subdirectory(CallCount)
STEP-2: Write LLVM Pass
$ cd CallCount # change directory to target folder
$ touch Hello.cpp # create C++ source file of target LLVM Pass
Hello.cpp defines the logic of our pass, which is calculating the number of Call
type instructions in program IR.
The content of Hello.cpp can be shown as below:
#include "llvm/ADT/Statistic.h"
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/Support/InstIterator.h"
#include <iostream>
#include <map>
#include <list>
#include <vector>
#include <set>
using namespace llvm;
namespace{
/****** analysis pass ********/
struct CallCount : public ModulePass{ // This pass is developed based on ModulePass.
// There are also some other LLVM classes, such as:
// CallGraphSCCPass, FunctionPass, LoopPass, and RegionPass.
static char ID;
int call_count = 0; // Global variable that records the number of call type instructions.
CallCount() : ModulePass(ID) {}
virtual bool runOnModule(Module &M){ // For each program IR, load it as a Module.
// Besides, you can regard this as the Main Function of this Pass.
for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F){ // Iterate each function in this Module.
for(Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB){ // Iterate each basic-block in current function.
CallCount::runOnBasicBlock(BB, M.getContext()); // Iterate each instructions in current basic-block.
}
}
return false;
}
virtual bool runOnBasicBlock(Function::iterator &BB, LLVMContext &context){ // The function that is used above, input is current basic-block.
for(BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE; ++BI){ // Interate each instructions in current basic-block.
int opcode = BI->getOpcode(); // Get opcode of current instruction:
// Opcode is the unique number for each instruction type.
// More details can be found at $YOUR-LOCAL-PATH$/llvm/include/llvm/IR/instruction.def
if(opcode == 49){ // Record if the type of current instruction is "call".
call_count++;
outs() << call_count << '\n';
}
}
return true;
}
};
}
char CallCount::ID = 0;
static RegisterPass<CallCount> X("CallCount", "Count call type instructions in given program IR", false, false);
// "CallCount" is the unique flag of this pass while being loaded by opt command.
STEP-3: Prepare for LLVM Pass Compilation
Usually, the pass folder (i.e. CallCount here) should contain three files:
So let’s create the CMakeLists.txt and Makefile.
$ touch CMakeLists.txt Makefile
We first modify CMakeLists.txt, of which content can be shown as below:
add_llvm_loadable_module(CallCount # The name of compiled LLVM Pass. So the output will be CallCount.so after the compilation.
Hello.cpp # The source code of LLVM Pass.
)
Then, we modify Makefile, of which content can be shown as below:
LEVEL = ../../..
LIBRARYNAME = CallCount # Also the name of compiled LLVM Pass. This should be consistent with the name in CMakeLists.txt.
LOADABLE_MODULE = 1
USEDLIBS =
# If we don't need RTTI or EH, there's no reason to export anything
# from the hello plugin.
ifneq ($(REQUIRES_RTTI), 1)
ifneq ($(REQUIRES_EH), 1)
EXPORTED_SYMBOL_FILE = $(PROJ_SRC_DIR)/Hello.exports
endif
endif
include $(LEVEL)/Makefile.common
Right now, we are almost done and the LLVM Pass is ready for compilation.
STEP-4: Compile LLVM Pass
$ cd $YOUR-LOCAL-PATH$/llvm/build # change directory to where we build this LLVM project
$ make -j$(nproc) # compile LLVM with multi-threads
Once the compilation is done, the target LLVM Pass CallCount.so
can be found at $YOUR-LOCAL-PATH$/llvm/build/lib/
.
To load the LLVM Pass for analyzing a program IR (e.g. pathfinder.ll), you can execute the following commands:
$ $YOUR-LOCAL-PATH$/llvm/build/bin/opt -load $YOUT-LOCAL-PATH$/llvm/build/lib/CallCount.so pathfinder.ll -CallCount -o output.ll
You can know how many call
type instructions are in pathfinder.ll :)
The source folder of this LLVM Pass can be found at git repository [4].
clang
: LLVM C compiler
$ clang hello-world.c -o hello-world # Compile C code to executable binary
$ clang -S -emit-llvm hello-world.c -o hello-world.ll # Compile C code to readable IR
clang++
: LLVM C++ compiler
$ clang++ hello-world.cpp -o hello-world # Compile C++ code to executable binary
$ clang++ -S -emit-llvm hello-world.cpp -o hello-world.ll # Compile C++ code to readable IR
llvm-as
: Assembler
$ llvm-as hello-world.ll -o hello-world.bc # Compile readable IR to bitcode format
llvm-dis
: Disassembler
$ llvm-dis hello-world.bc -o hello-world.ll # Compile bitcode format to readable IR
llvm-link
: Linker
$ llvm-link -S hello.ll world.ll -o hello-world.ll # Link two IRs into a unified one
llc
: Static compiler
$ llc hello-world.ll -o hello-world.s # Compile IR into assembly code for a specified architecture
lli
: Directly execute IR
$ lli pathfinder.ll 1000 10 # Directly execute Rodinia-pathfinder IR with input "1000 10" using a just-in-time compiler
# This IR can be found at git repository [4].
opt
: Optimizer (this is the standard optimizer in LLVM middle-end)
$ opt -load ./CallCount.so pathfinder.ll -CallCount -o output.ll
# Load LLVM Pass for code transformation and optimization.
# CallCount.so is the LLVM Pass we want to load.
# -CallCount is the unique flag of this Pass registered in current LLVM project.
# The output.ll is bitcode format, which can be disassembler to readable IR via llvm-dis.
[1] LLVM Documents: [Link]
[2] Blog: LLVM IR and Go: [Link]
[3] Mapping High Level Constructs to LLVM IR: [Link]
[4] Github Repo: LLFI-Quick-Start: [Link]
Powered by Jekyll and Minimal Light theme.