Skip to main content
Version: 4.x

Profile-Guided Optimization (PGO)

Profile-Guided Optimization (PGO) is an advanced compilation technique that uses runtime performance data to guide compiler optimizations, resulting in more efficient executable code. The ZCC compiler supports PGO in bare-metal environments (without an operating system).

PGO quick start

Follow this quick start guide using hello.c:

#include <stdio.h>

int main() {
printf("Hello\n");
return 0;
}

Step 1

Compile the source file with the -fprofile-instr-generate option. This directs the compiler to insert instrumentation code that collects runtime performance data.

zcc hello.c -fprofile-instr-generate -o a.out

Step 2

Execute the generated program. In bare-metal environments, the profile data is output as formatted text. Redirect the standard output to a file to save this data.

./a.out > a.txt
tip
  • The profile data is wrapped between specific markers:
    ====LLVMProfilingData:begin====
    ...(profile data)...
    ====LLVMProfilingData:end====
  • You can execute the program multiple times (with the same or different parameters) and save each output separately (e.g., a1.txt, a2.txt) to gather comprehensive performance samples.

Step 3

Use the llvm-profdata tool to convert the text-based profile data into binary format for further processing.

llvm-profdata translate a.txt --output=a.bin

If multiple data files exist, convert each one:

llvm-profdata translate a1.txt --output=a1.bin
llvm-profdata translate a2.txt --output=a2.bin

Step 4

Combine multiple binary profile data files into a single profile that will be used to guide compiler optimizations.

  • Merge specific files:

    llvm-profdata merge -output=a.profdata a1.bin a2.bin a3.bin
  • Merge using wildcards:

    llvm-profdata merge -output=a.profdata a*.bin

Step 5

Recompile the source code using the -fprofile-instr-use option with the merged profile (a.profdata). The compiler will use the profiling data to perform targeted optimizations and generate an optimized executable.

zcc hello.c -fprofile-instr-use=a.profdata -o a.pgo.out

Step 6

Execute the final PGO-optimized program to experience the performance improvements.

./a.pgo.out

PGO print function

ZCC provides the interface void __zcc_baremetal_profile_dump(void) to print PGO data. By default, the compiler will call __zcc_baremetal_profile_dump in the main function after the main function ends to print the PGO data. If the user's main function does not end, the user can manually call the __zcc_baremetal_profile_dump function in the main function to print the PGO data.

Linker script

After enabling PGO, the compiler will generate some sections starting with __llvm_prf_, such as __llvm_prf_names, __llvm_prf_cnts, __llvm_prf_data, etc. Each section defines two symbols, one pointing to the beginning and the other to the end of the section. For example, __start___llvm_prf_names and __stop___llvm_prf_names point to the beginning and end of __llvm_prf_names, respectively.

If you put the section starting with __llvm_prf_ into other sections in the linker script, the compiler will not define the __start_ and __stop_ symbols for this section. In this case, you need to explicitly define these two symbols and ensure that the relevant section is aligned to 8 bytes. For example:

  .data           :
{
*(.data .data.*)
. = ALIGN(8);
__start___llvm_prf_cnts = .;
*(__llvm_prf_cnts)
. = ALIGN(8);
__stop___llvm_prf_cnts = .;

. = ALIGN(8);
__start___llvm_prf_names = .;
*(__llvm_prf_names)
. = ALIGN(8);
__stop___llvm_prf_names = .;

. = ALIGN(8);
__start___llvm_prf_data = .;
*(__llvm_prf_data)
. = ALIGN(8);
__stop___llvm_prf_data = .;
}