Understanding the C++ Compilation Process

Introduction

Every time you hit "build" in your IDE or run g++ in the terminal, a lot more is happening under the hood than just turning source code into an executable. The C++ compilation process is a multi-stage pipeline, and understanding each stage gives you better intuition for compiler errors, linker issues, and performance trade-offs. This post walks through the four main stages: preprocessing, compilation, assembly, and linking.

To keep things concrete, we'll use a small example split across three files:

hello.hpp

#pragma once
 
void sayHello();

hello.cpp

#include "hello.hpp"
#include <iostream>
 
void sayHello() {
    std::cout << "Hello, World!" << std::endl;
}

main.cpp

#include "hello.hpp"
 
int main() {
    sayHello();
    return 0;
}

Step 1: Preprocessing

Before any real compilation begins, the preprocessor runs through your source file and handles lines that start with #. This includes expanding macros, pulling in headers through #include, removing comments, and processing conditional blocks like #ifdef.

You can run the preprocessing step manually:

g++ -E main.cpp -o main.i

This produces a .i file, which is a fully expanded translation unit. At this point, the file is still just text, but all included headers have been copied into place.

In our case, the preprocessed output for main.cpp will contain the declaration from hello.hpp:

void sayHello();
 
int main() {
    sayHello();
    return 0;
}

If you preprocess hello.cpp, the output will be much larger because it includes <iostream>, which pulls in a lot of standard library code.

The important thing to understand is that nothing has been turned into machine code yet. The preprocessor is mainly preparing the source code for the compiler.

Step 2: Compilation

Next, the compiler takes the preprocessed file and converts it into assembly code. This is where the real C++ language work happens: syntax checking, type checking, overload resolution, template instantiation, and optimization.

You can generate assembly manually with:

g++ -S main.i -o main.s

The result is a .s file containing human-readable assembly instructions for your target architecture. A simplified version of the output might look something like this:

main:
    pushq   %rbp
    movq    %rsp, %rbp
    call    sayHello
    movl    $0, %eax
    popq    %rbp
    ret

Notice that the body of sayHello is not here. main.cpp only saw the declaration from the header, so the compiler can generate a call to sayHello, but the actual definition will be resolved later during linking.

The actual output depends on your compiler, operating system, CPU architecture, and optimization flags.

This is also where flags like -O2 and -O3 matter. The compiler may inline functions, remove dead code, simplify expressions, and reorder instructions to make better use of the CPU.

Each .cpp file is compiled independently as its own translation unit. That is why changing one .cpp file usually does not require your entire project to be rebuilt from scratch.

Step 3: Assembly

The assembler takes the .s file and converts it into machine code. The result is an object file:

g++ -c main.s -o main.o

The object file contains binary instructions, but it is not a complete executable yet. It may still refer to symbols that are defined somewhere else.

In our example, main.o contains a call to sayHello, but not the actual definition of sayHello. That definition will come from hello.o later, when we compile hello.cpp and link the object files together.

You can inspect object files with tools like nm or objdump:

nm main.o       # list defined and referenced symbols
objdump -d main.o   # disassemble the machine code

This is especially handy when you're hunting down a stubborn linker error because you can see exactly what's missing and where.

Step 4: Linking

The linker is the final stage. It takes all object files and libraries, resolves symbol references, and produces a runnable executable.

For our example, we first compile both .cpp files into object files:

g++ -c main.cpp -o main.o
g++ -c hello.cpp -o hello.o

Then we link the object files into an executable:

g++ main.o hello.o -o myprogram

Now the program can run:

./myprogram
# Hello, World!

During linking, the unresolved call to sayHello in main.o is matched with the actual definition in hello.o.

If the linker cannot find a symbol, you get an error like this:

undefined reference to `sayHello`

That usually means the function was declared, but its implementation was never linked into the final program.

If the same function is defined in multiple object files, you may get a multiple definition error instead. This often happens when non-inline function definitions are placed directly in headers and then included by multiple .cpp files.

Libraries are also handled at this stage. Static libraries are copied into the final binary at link time, while dynamic libraries are resolved when the program is loaded or run. This difference matters when thinking about deployment, binary size, and dependency management.

Summary

The next time a build fails, it helps to know where in the pipeline things went wrong.

Preprocessing expands macros, includes, and conditional compilation. Compilation turns C++ code into assembly and performs language checks and optimizations. Assembly produces machine-code object files. Linking combines those object files and libraries into a runnable binary.

Running each step manually with g++ is a great exercise if you've never done it before. Seeing the intermediate outputs makes compiler errors, linker errors, and rebuild behavior much easier to understand.