ZCC 4.x User Manual
Terapines compiler ZCC is a high-performance C/C++ compiler for RISC-V based on LLVM. It supports the most recent C and C++ standards, including C17, C99, C11, C++17, C++14 and C++11 etc and brings the following key features.
-
RVV auto-vectorization and other compiler optimizations.
-
Support RISC-V ISAs, including extensions and vendor extensions from XuanTie,Nuclei and Andes.
Download and Installation
ZCC is a high performance RISC-V toolchain that provides consistent experience on Windows and Linux.
System requirements
You can review the system requirements to check if your computer configuration is supported.
-
Windows : Windows 10 (32-bit and 64-bit) and above
-
Linux:
-
Ubuntu 18, Ubuntu 20, Ubuntu 22.04 and Ubuntu 24.04
-
Centos 6, CentOS 7 and Centos8
-
Fedora 42
-
openSUSE Leap 15.5
-
Setup using the Terapines Installer
The Terapines Installer is the recommended tool to setup Terapines products. For users without a graphical interface, please use the Command Line Interface (CLI) installer.
- Windows
- Linux (GUI)
- Linux (CLI)
-
Select
Windows
from pull-down box to download the Installer.exe
from ZCC download page. -
Run the Installer with administrator privileges and select the product that you want to install.
-
ZCC toolchain by default included LibZCC. For additional libraries, choose to install. The installed libraries would be added in toolchain and apply globally.
-
LibDSP: It offers a collection of functions and tools specifically designed for digital signal processing (DSP).
-
LibNN: Specialized library for implementing and running neural network (NN) algorithms.
-
You can use Product Manager to manage several versions of the samne products. Product Manager is installed in the C:\Program Files\Terapines
directory. It can be opened directly from the Start menu if you check the box of adding system PATH. For community edition users, you don't have to sign in. For Commercial edition users, log in to your Terapines account from Product Manager, and it will automatically activate the available license for the product you install.
-
Select
Linux
from pull-down box to download the Installer executable from ZCC download page. -
Make the ZCC-Installer executable by all users:
chmod a+x ZCC-Installer
-
Run the executable with administrator privileges and select the product that you want to install.
sudo ./ZCC-Installer
-
ZCC toolchain by default included LibZCC. For additional libraries, choose to install. The installed libraries would be added in toolchain and apply globally.
-
LibDSP: It offers a collection of functions and tools specifically designed for digital signal processing (DSP).
-
LibNN: Specialized library for implementing and running neural network (NN) algorithms.
-
You can use Product Manager to manage several versions of the samne products. Product Manager is installed in the /opt/Terapines/
directory. For community edition users, you don't have to sign in. For Commercial edition users, log in to your Terapines account from Product Manager, and it will automatically activate the available license for the product you install.
-
Select
Linux (CLI)
from pull-down box to download the Installer executable from ZCC download page. -
Make the ZCC-Installer executable by all users:
chmod a+x ZCC-Installer-CLI
-
Run the executable with administrator privileges, select the product that you want to install and follow the step instruction.
sudo ./ZCC-Installer-CLI
-
ZCC toolchain by default included LibZCC. For additional libraries, choose to install.
-
LibDSP: It offers a collection of functions and tools specifically designed for digital signal processing (DSP).
-
LibNN: Specialized library for implementing and running neural network (NN) algorithms.
-
You can use Product Manager to manage several versions of the samne products. Product Manager is installed in the /opt/Terapines/
directory. For community edition users, you don't have to sign in. For Commercial edition users, log in to your Terapines account from Product Manager, and it will automatically activate the available license for the product you install.
Language standards
-
-x <language>
Treat subsequent input files as having type
<language>
. The optional<language>
is C or C++. -
-std=<standard>
Select the language standard to compile for. Supported values are listed in the table below. The supported language standards maintain with the specifications and details provided by the upstream source. Please refer to C++ Support in Clang and C Support in Clang.
-
-ansi
Same as "-std=c89".
Standards | Versions |
---|---|
C Standard | c18; c17; c11; c99; c90; c89 |
C++ Standard | c++17; c++14; c++11; c++03; c++98 |
GNU C | gnu18; gnu17; gnu11; gnu89; gnu++17; gnu++14; gnu++11; gnu++98 |
ISO C | iso9899:2018; iso9899:2017; iso9899:2011; iso9899:1999; iso9899:199409; iso9899:1990 |
- C17 & C18: Since it was under development in 2017, and officially published in 2018, C17 is sometimes referred to as C18.
- C89 & C90: C90 is the same standard as C89 was ratified by ISO/IEC as ISO/IEC 9899:1990, with only formatting changes. Therefore, the terms "C89" and "C90" refer to essentially the same language.
RISC-V target support
Use -print-supported-extensions
to print the list of all extensions that are supported in ZCC.
Base ISAs
Currently, ZCC fully supports three base instruction sets: RV32I, RV32E and RV64I.
To specify the target triple:
riscv32
RISC-V with XLEN=32 (i.e. RV32I or RV32E)riscv64
RISC-V with XLEN=64 (i.e. RV64I)
To select an E variant ISA (e.g. RV32E instead of RV32I), use the base architecture string (e.g. riscv32
) with the extension e
.
Extensions
The table below provides the extensions in ZCC that ensure compatibility, including standard extensions as well as some experimental extensions from older versions.
The Zp052b
extension is version 0.52 of the P extension, which is an older experimental version. ZCC has extracted and designated it as a standard extension named Zp052b
. When using this extension, there is no need to specify a version number—simply use zp052b
.
zcc -march=rv32imaczp052b -c hello.c
zcc -march=rv32imafdc -c hello.c
Extension | Version | Feature | Description |
---|---|---|---|
I | 2.1 | i | Base Integer Instruction Set |
E | 2.0 | e | Implements RV64E (provides 16 rather than 32 GPRs) |
M | 2.0 | m | Integer Multiplication and Division |
A | 2.1 | a | Atomic Instructions |
F | 2.2 | f | Single-Precision Floating-Point |
D | 2.2 | d | Double-Precision Floating-Point |
C | 2.0 | c | Compressed Instructions |
B | 1.0 | b | the collection of the Zba, Zbb, Zbs extensions |
V | 1.0 | v | Vector Extension for Application Processors |
H | 1.0 | h | Hypervisor |
Zic64b | 1.0 | zic64b | Cache Block Size Is 64 Bytes |
Zicbom | 1.0 | zicbom | Cache-Block Management Instructions |
Zicbop | 1.0 | zicbop | Cache-Block Prefetch Instructions |
Zicboz | 1.0 | zicboz | Cache-Block Zero Instructions |
Ziccamoa | 1.0 | ziccamoa | Main Memory Supports All Atomics in A |
Ziccif | 1.0 | ziccif | Main Memory Supports Instruction Fetch with Atomicity Requirement |
Zicclsm | 1.0 | zicclsm | Main Memory Supports Misaligned Loads/Stores |
Ziccrse | 1.0 | ziccrse | Main Memory Supports Forward Progress on LR/SC Sequences |
Zicntr | 2.0 | zicntr | Base Counters and Timers |
Zicond | 1.0 | zicond | Integer Conditional Operations |
Zicsr | 2.0 | zicsr | Control and Status Register (CSR) Instructions |
Zifencei | 2.0 | zifencei | fence.i |
Zihintntl | 1.0 | zihintntl | Non-Temporal Locality Hints |
Zihintpause | 2.0 | zihintpause | Pause Hint |
Zihpm | 2.0 | zihpm | Hardware Performance Counters |
Zimop | 1.0 | zimop | May-Be-Operations |
Zmmul | 1.0 | zmmul | Integer Multiplication |
Za128rs | 1.0 | za128rs | Reservation Set Size of at Most 128 Bytes |
Za64rs | 1.0 | za64rs | Reservation Set Size of at Most 64 Bytes |
Zaamo | 1.0 | zaamo | Atomic Memory Operations |
Zabha | 1.0 | zabha | Byte and Halfword Atomic Memory Operations |
Zalrsc | 1.0 | zalrsc | Load-Reserved/Store-Conditional |
Zama16b | 1.0 | zama16b | Atomic 16-byte misaligned loads, stores and AMOs |
Zawrs | 1.0 | zawrs | Wait on Reservation Set |
Zfa | 1.0 | zfa | Additional Floating-Point |
Zfbfmin | 1.0 | zfbfmin | Scalar BF16 Converts |
Zfh | 1.0 | zfh | Half-Precision Floating-Point |
Zfhmin | 1.0 | zfhmin | Half-Precision Floating-Point Minimal |
Zfinx | 1.0 | zfinx | Float in Integer |
Zdinx | 1.0 | zdinx | Double in Integer |
Zca | 1.0 | zca | part of the C extension, excluding compressed floating point loads/stores |
Zcb | 1.0 | zcb | Compressed basic bit manipulation instructions |
Zcd | 1.0 | zcd | Compressed Double-Precision Floating-Point Instructions |
Zce | 1.0 | zce | Compressed extensions for microcontrollers |
Zcf | 1.0 | zcf | Compressed Single-Precision Floating-Point Instructions |
Zcmop | 1.0 | zcmop | Compressed May-Be-Operations |
Zcmp | 1.0 | zcmp | sequenced instructions for code-size reduction |
Zcmt | 1.0 | zcmt | table jump instructions for code-size reduction |
Zba | 1.0 | zba | Address Generation Instructions |
Zbb | 1.0 | zbb | Basic Bit-Manipulation |
Zbc | 1.0 | zbc | Carry-Less Multiplication |
Zbkb | 1.0 | zbkb | Bitmanip instructions for Cryptography |
Zbkc | 1.0 | zbkc | Carry-less multiply instructions for Cryptography |
Zbkx | 1.0 | zbkx | Crossbar permutation instructions |
Zbs | 1.0 | zbs | Single-Bit Instructions |
Zk | 1.0 | zk | Standard scalar cryptography extension |
Zkn | 1.0 | zkn | NIST Algorithm Suite |
Zknd | 1.0 | zknd | NIST Suite: AES Decryption |
Zkne | 1.0 | zkne | NIST Suite: AES Encryption |
Zknh | 1.0 | zknh | NIST Suite: Hash Function Instructions |
Zkr | 1.0 | zkr | Entropy Source Extension |
Zks | 1.0 | zks | ShangMi Algorithm Suite |
Zksed | 1.0 | zksed | ShangMi Suite: SM4 Block Cipher Instructions |
Zksh | 1.0 | zksh | ShangMi Suite: SM3 Hash Function Instructions |
Zkt | 1.0 | zkt | Data Independent Execution Latency |
Ztso | 1.0 | ztso | Memory Model - Total Store Order |
Zp052b | 0.52 | zp052b | Packed-SIMD Instructions |
Zp053b | 0.53 | zp053b | Packed-SIMD Instructions |
Zp054b | 0.54 | zp054b | Packed-SIMD Instructions |
Zp64054b | 0.54 | zp095b | RV32 only 'P' Instructions |
Zp095b | 0.95 | zp64054b | Packed-SIMD Instructions |
Zpn095b | 0.95 | zpn095b | Normal 'P' Instructions |
Zprvsfextra095b | 0.95 | zprvsfextra095b | RV64 only 'P' Instructions |
Zpsfoperand095b | 0.95 | zpsfoperand095b | Paired-register operand 'P' Instructions |
Zvbb | 1.0 | zvbb | Vector basic bit-manipulation instructions |
Zvbc | 1.0 | zvbc | Vector Carryless Multiplication |
Zve32f | 1.0 | zve32f | Vector Extensions for Embedded Processors with maximal 32 EEW and F ex |
Zve32x | 1.0 | zve32x | Vector Extensions for Embedded Processors with maximal 32 EEW |
Zve64d | 1.0 | zve64d | Vector Extensions for Embedded Processors with maximal 64 EEW, F and D |
Zve64f | 1.0 | zve64f | Vector Extensions for Embedded Processors with maximal 64 EEW and F ex |
Zve64x | 1.0 | zve64x | Vector Extensions for Embedded Processors with maximal 64 EEW |
Zvbfmin | 1.0 | zvfbfmin | Vector BF16 Converts |
Zvfbfwma | 1.0 | zvfbfwma | Vector BF16 widening mul-add |
Zvfh | 1.0 | zvfh | Vector Half-Precision Floating-Point |
Zvfhmin | 1.0 | zvfhmin | Vector Half-Precision Floating-Point Minimal |
Zvkb | 1.0 | zvkb | Vector Bit-manipulation used in Cryptography |
Zvkg | 1.0 | zvkg | Vector GCM instructions for Cryptography |
Zvkn | 1.0 | zvkn | shorthand for 'Zvkned', 'Zvknhb', 'Zvkb', and 'Zvkt' |
Zvknc | 1.0 | zvknc | shorthand for 'Zvknc' and 'Zvbc' |
Zvkned | 1.0 | zvkned | Vector AES Encryption & Decryption (Single Round) |
Zvkng | 1.0 | zvkng | shorthand for 'Zvkn' and 'Zvkg' |
Zvknha | 1.0 | zvknha | Vector SHA-2 (SHA-256 only) |
Zvknhb | 1.0 | zvknhb | Vector SHA-2 (SHA-256 and SHA-512) |
Zvks | 1.0 | zvks | shorthand for 'Zvksed', 'Zvksh', 'Zvkb', and 'Zvkt' |
Zvksc | 1.0 | zvksc | shorthand for 'Zvks' and 'Zvbc' |
Zvksed | 1.0 | zvksed | SM4 Block Cipher Instructions |
Zvksg | 1.0 | zvksg | shorthand for 'Zvks' and 'Zvkg' |
Zvksh | 1.0 | zvksh | SM3 Hash Function Instructions |
Zvkt | 1.0 | zvkt | Vector Data-Independent Execution Latency |
Zvl1024b | 1.0 | zvl1024b | Zvl (Minimum Vector Length) 1024 |
Zvl128b | 1.0 | zvl128b | Zvl (Minimum Vector Length) 128 |
Zvl16384b | 1.0 | zvl16384b | Zvl (Minimum Vector Length) 16384 |
Zvl2048b | 1.0 | zvl2048b | Zvl (Minimum Vector Length) 2048 |
Zvl256b | 1.0 | zvl256b | Zvl (Minimum Vector Length) 256 |
Zvl32768b | 1.0 | zvl32768b | Zvl (Minimum Vector Length) 32768 |
Zvl32b | 1.0 | zvl32b | Zvl (Minimum Vector Length) 32 |
Zvl4096b | 1.0 | zvl4096b | Zvl (Minimum Vector Length) 4096 |
Zvl512b | 1.0 | zvl512b | Zvl (Minimum Vector Length) 512 |
Zvl64b | 1.0 | zvl64b | Zvl (Minimum Vector Length) 64 |
Zvl65536b | 1.0 | zvl65536b | Zvl (Minimum Vector Length) 65536 |
Zvl8192b | 1.0 | zvl8192b | Zvl (Minimum Vector Length) 8192 |
Zhinx | 1.0 | zhinx | Zhinx (Half Float in Integer) |
Zhinxmin | 1.0 | zhinxmin | Zhinxmin (Half Float in Integer Minimal) |
Shcounterenw | 1.0 | shcounterenw | Support writeable hcounteren enable bit for any hpmcounter that is not read-only zero |
Shgatpa | 1.0 | shgatpa | SvNNx4 mode supported for all modes supported by satp, as well as Bare |
Shtvala | 1.0 | shtvala | htval provides all needed values |
Shvsatpa | 1.0 | shvsatpa | vsatp supports all modes supported by satp |
Shvstvala | 1.0 | shvstvala | vstval provides all needed values |
Shvstvecd | 1.0 | shvstvecd | vstvec supports Direct mode |
Smaia | 1.0 | smaia | Advanced Interrupt Architecture Machine Level |
Smcdeleg | 1.0 | smcdeleg | Counter Delegation Machine Level |
Smcsrind | 1.0 | smcsrind | Indirect CSR Access Machine Level |
Smepmp | 1.0 | smepmp | Enhanced Physical Memory Protection |
Smstateen | 1.0 | smstateen | Machine-mode view of the state-enable extension |
Ssaia | 1.0 | ssaia | Advanced Interrupt Architecture Supervisor Level |
Ssccfg | 1.0 | ssccfg | Counter Configuration Supervisor Level |
Ssccptr | 1.0 | ssccptr | Main memory supports page table reads |
Sscofpmf | 1.0 | sscofpmf | Count Overflow and Mode-Based Filtering |
Sscounterenw | 1.0 | sscounterenw | Support writeable scounteren enable bit for any hpmcounter that is not read-only zero |
Sscsrind | 1.0 | sscsrind | Indirect CSR Access Supervisor Level |
Ssstateen | 1.0 | ssstateen | Supervisor-mode view of the state-enable extension |
Ssstrict | 1.0 | ssstrict | No non-conforming extensions are present |
Sstc | 1.0 | sstc | Supervisor-mode timer interrupts |
Sstvala | 1.0 | sstvala | stval provides all needed values |
Sstvecd | 1.0 | sstvecd | stvec supports Direct mode |
Ssu64xl | 1.0 | ssu64xl | UXLEN=64 supported |
Svade | 1.0 | svade | Raise exceptions on improper A/D bits |
Svadu | 1.0 | svadu | Hardware A/D updates |
Svbare | 1.0 | svbare | $(satp mode Bare supported) |
Svinval | 1.0 | svinval | Fine-Grained Address-Translation Cache Invalidation |
Svnapot | 1.0 | svnapot | NAPOT Translation Contiguity |
Svpbmt | 1.0 | svpbmt | Page-Based Memory Types |
Experimental Extensions
Experimental extensions are expected to either transition to ratified status, or the old version. The compatibility of extensions between toolchain versions is not guaranteed. When using these extensions, version numbers must be added.
For example, when using the Zalasr
extension, you need to add the version number, that is, use zalasr0p1
instead of zalasr
.
zcc -march=rv32imaczalasr0p1 -c hello.c
zcc -march=rv32imaczicfiss1p0 -c hello.c
Extension | Version | Description |
---|---|---|
Zicfilp | 1.0 | Landing pad |
Zicfiss | 1.0 | Shadow stack |
Zacas | 1.0 | Atomic Compare-And-Swap Instructions |
Zalasr | 0.1 | Load-Acquire and Store-Release Instructions |
Smmpm | 1.0 | Machine-level Pointer Masking for M-mode |
Smnpm | 1.0 | Machine-level Pointer Masking for next lower privilege mode |
Ssnpm | 1.0 | Supervisor-level Pointer Masking for next lower privilege mode |
Sspm | 1.0 | Indicates Supervisor-mode Pointer Masking |
Ssqosid | 1.0 | Quality-of-Service (QoS) Identifiers |
Supm | 1.0 | Indicates User-mode Pointer Masking |
Supm | 1.0 | Indicates User-mode Pointer Masking |
Vendor Extensions
Vendor extensions are extensions which are defined by a hardware vendor.
Extension | Version | Feature | Description |
---|---|---|---|
Xandes | 5.0 | xandes | AndeStar V5 Extension Specification |
Xcvalu | 1.0 | xcvalu | CORE-V ALU Operations |
XCVbi | 1.0 | xcvbi | CORE-V Immediate Branching |
XCVbitmanip | 1.0 | xcvbitmanip | CORE-V Bit Manipulation |
XCVelw | 1.0 | xcvelw | CORE-V Event Load Word |
XCVmac | 1.0 | xcvmac | CORE-V Multiply-Accumulate |
XCVmem | 1.0 | xcvmem | CORE-V Post-incrementing Load & Store |
XCVsimd | 1.0 | xcvsimd | CORE-V SIMD ALU |
Xgap8dsp | 4.0 | xgap8dsp | 'Xp' (GAP8 DSP extension) |
Xgap8m | 4.0 | xgap8m | 'Xm' (GAP8 Integer Multiplication) |
Xgap8v | 4.0 | xgap8v | 'Xv' (GAP8 Vector extension) |
XSfcease | 1.0 | xsfcease | SiFive sf.cease Instruction |
XSfvcp | 1.0 | xsfvcp | SiFive Custom Vector Coprocessor Interface Instructions |
XSfvfnrclipxfqf | 1.0 | xsfvfnrclipxfqf | SiFive FP32-to-int8 Ranged Clip Instructions |
XSfvfwmaccqqq | 1.0 | xsfvfwmaccqqq | SiFive Matrix Multiply Accumulate Instruction and 4-by-4 |
XSfvqmaccdod | 1.0 | xsfvqmaccdod | SiFive Int8 Matrix Multiplication Instructions (2-by-8 and 8-by-2) |
XSfvqmaccqoq | 1.0 | xsfvqmaccqoq | SiFive Int8 Matrix Multiplication Instructions (4-by-8 and 8-by-4) |
XSiFivecdiscarddlone | 1.0 | xsifivecdiscarddlone | SiFive sf.cdiscard.d.l1 Instruction |
XSiFivecflushdlone | 1.0 | xsifivecflushdlone | SiFive sf.cflush.d.l1 Instruction |
XTHeadBa | 1.0 | xtheadba | T-Head address calculation instructions |
XTHeadBb | 1.0 | xtheadbb | T-Head basic bit-manipulation instructions |
XTHeadBs | 1.0 | xtheadbs | T-Head single-bit instructions |
XTHeadCmo | 1.0 | xtheadcmo | T-Head cache management instructions |
XTHeadCondMov | 1.0 | xtheadcondmov | T-Head conditional move instructions |
XTHeadFMemIdx | 1.0 | xtheadfmemidx | T-Head FP Indexed Memory Operations |
XTHeadMac | 1.0 | xtheadmac | T-Head Multiply-Accumulate Instructions |
XTHeadMemIdx | 1.0 | xtheadmemidx | T-Head Indexed Memory Operations |
XTHeadMemPair | 1.0 | xtheadmempair | T-Head two-GPR Memory Operations |
XTHeadSync | 1.0 | xtheadsync | T-Head multicore synchronization instructions |
XTHeadVdot | 1.0 | xtheadvdot | T-Head Vector Extensions for Dot |
XVentanaCondOps | 1.0 | xventanacondops | Ventana Conditional Ops |
Xwchc | 2.2 | xwchc | WCH/QingKe additional compressed opcodes |
Xxlcz | 1.0 | xxlcz | Nuclei Additional Xlcz Instruction for Codesize |
Xxldsp | 1.0 | xxldsp | Nuclei customized DSP instructions for both RV32 and RV64 |
Xxldspn1x | 1.0 | xxldspn1x | Nuclei customized DSP N1 instructions only for RV32 |
Xxldspn2x | 1.0 | xxldspn2x | Nuclei customized DSP N2 instructions only for RV32 |
Xxldspn3x | 1.0 | xxldspn3x | Nuclei customized DSP N3 instructions only for RV32 |
Xxlvqmacc | 1.0 | xxlvqmacc | Nuclei Int8 Matrix Multiplication Instructions (4-by-4 and 4-by-4) |
Profiles
Supported RISC-V profile names can be passed using -march
instead of a standard ISA naming string. Currently supported profiles:
Supported Profiles | Experimental Profiles |
---|---|
rva20s64 | rva23s64 |
rva20u64 | rva23u64 |
rva22s64 | rvb23s64 |
rva22u64 | rvb23u64 |
rvi20u32 | rvm23u32 |
rvi20u64 |
Note that you can also append additional extension names to be enabled, e.g. rva20u64_zicond
will enable the zicond
extension in addition to those in the rva20u64
profile.
Compilation Options
Since ZCC 4.x is based on LLVM 19.1.6, most of the LLVM compiler options are applicable to ZCC.
-target option
Specify the -target <architecture>
to build for. Arguments that can be used are listed below:
- riscv64-unknown-elf
- riscv32-unknown-elf
- riscv64-unknown-linux-gnu
-march option
specify -march=<architecture>
to generate code for a specific processor architectures.
For the detection rules of march, the format follows -march=rv[32|64][i|e][extensions]
. The order of components is not strictly enforced when using it, and the final linking will generate instructions according to the specified march. For example: when using -march=rv32imafc
, ZCC will look for libraries with rv32ifa
and also generate instructions for the M extension.
Multilib
ZCC will select compatible zcc libraries to add to the application based on the arch/abi combination specified by the user. For example:
Specified arch/abi combination | Applied zcc library |
---|---|
-march=rv32imafc -mabi=ilp32f | rv32ifa/ilp32f |
-march=rv32imafc_zba_zbb_zbc_zbs_zp052b -mabi=ilp32f | rv32ifap0p95_zp052b_zp053b_zp054b/ilp32f |
For applications using Arch extensions that do not exist in the library, such as M and C extensions, their optimizations will be performed during the IR to assembly translation stage, so no optimizations will be missing. For certain Arch extensions, such as P and V extensions, since their optimizations are performed during C source code to IR translation, separate libraries must be created for them.
-mtune option
When specify -mtune=
, ZCC will perform optimization on the target CPU. Arguments that can be used by -mtune=
are listed below:
- THead
- thead-c908-series
- Andes
- andes-kavalan
- andes-vicuna
- andes-d25-series
- andes-d45-series
- Tenstorrent
- ascalon
- Nuclei
- nuclei-100-series
- nuclei-200-series
- nuclei-300-series
- nuclei-310-series
- nuclei-600-series
- nuclei-900-series
- nuclei-1000-series
- Rocket
- rocket
- Imagination
- rtxm2200
- Sifive
- sifive-7-series
- Syntacore
- syntacore-scr1-series
Optimization options
-
-O0, -O1, -O2, -O3, -Os
Specify which optimization level to use:
-
O0: Means “no optimization”: this level compiles the fastest and generates the most debuggable code.
-
O1: Somewhere between -O0 and -O2.
-
O2: Moderate level of optimization which enables most optimizations.
-
O3: Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
-
Os: Like -O2 with extra optimizations to reduce code size.
-
Troubleshooting
C/C++ compilation options
-
The auto-vectorization in ZCC is highly aggressive. For the vast majority of inner-most loops, the auto-vectorization can achieve performance on par with or even exceeding handwritten intrinsics. For nested (outer) loop vectorization, the results are also impressive.For AI kernels commonly used in practical scenarios, such as correlation and resizing, the code performance generated by ZCC's auto-vectorization is very close to that of handwritten intrinsic.
When enabling auto-vectorization with the options
-march=rv32/64gcv -O3
, you can activate aggressive optimizations for operations less than 32 (or 64) bits by adding the option-mllvm --no-integer-promotions
. Note that when using this option, there should be no implicit type conversions in the source code, as this could result in undefined behavior. Examples of correct and incorrect usage are as follows:-
Correct Example
void foo(size_t count, int8_t* channel1, int8_t* channel2, int16_t* output) {
for (size_t i = 0; i < count; i++) {
// With --no-integer-promotions, only explicit type promotion from int8_t to int16_t occurs
output[i] = (int16_t)channel1[i] * (int16_t)channel2[i];
}
} -
Error example
void foo(size_t count, int8_t* channel1, int8_t* channel2, int16_t* output) {
for (size_t i = 0; i < count; i++) {
// Implicit type promotion from int16_t and int8_t to int32_t occurs
output[i] = channel1[i] * channel2[i];
}
-
-
It is sometimes impossible for the compiler to determine which loops require what kind of vectorization, especially in the case of nested loop vectorization. In such cases, users need to use
#pragma
directives to explicitly instruct the compiler on which loop levels to vectorize, how to configure the maximum register grouping, and other related optimizations. See the example below for reference://Currently, multi-level loop vectors only support loops without dependencies, or the user can ensure that the program will not have dependencies when vectorizing loops at this level.
// The option vectorize(assume_safety) is required. It mainly tells the compiler that the memory accesses in the loop will not overlap (pointer alias or partial alias). This option can also ignore unknown accesses like a[idx[i]] Due to memory limitations, please ensure that loops do not have dependencies.
//The option vectorize_width(16, scalable) optionally determines the number of RVV register groups, based on the width of the widest element in the loop multiplied by 16. If not selected, the compiler will automatically calculate an appropriate number of RVV register groups.
// as follows
// mf8 # LMUL=1/8 base on 8bit
// mf4 # LMUL=1/4 base on 16bit
// mf2 # LMUL=1/2 base on 32bit
// m1 # LMUL=1 base on 64bit
// m2 # LMUL=2 base on 128bit
// m4 # LMUL=4 base on 256bit
// m8 # LMUL=8 base on 512bit
#pragma clang loop vectorize(assume_safety) vectorize_width(16, scalable)
for (uint16_t w = 0; w < width: w++) {
.......
}infoPlease refer to the LLVM User Manual for specific usage examples.
-
To enable nested (outer) loop auto-vectorization, an additional option
-mllvm --enable-vplan-native-path
needs to be added.tipThis option is in the development stage and only needs to be added if the outer loop uses pragma. It will be removed when auto-vectorization is fully optimized.
-
ZCC enables link-time optimization by default, so the generated intermediate result files (generated with the
-c
option) are in LLVM bytecode. If you need to analyze the assembly code of the intermediate results or disable link-time optimization, you can add the-fno-lto
option. -
More aggressive code size optimization options:
-
-mllvm --riscv-machine-outliner=true
: In the case of LTO (which is enabled by default in ZCC), this option requires appending-Wl,-mllvm,--riscv-machine-outliner=true
. This optimization is focused on reducing code size, but it may result in a decrease in program performance. -
-config small.cfg
: This option enables ZCC to link libraries optimized for code size.
-
-
By default, ZCC does not enable the
fp-contract
optimization. When the F and D extensions are available, you can manually enable this optimization using the-ffp-contract
option, which allows the generation of additional FMAD-type instructions. -
The
-munaligned-access
option can generate unaligned memory access instructions and align global variables to 1 byte. -
ZCC does not generate RVV strided/index load instructions by default. The
-mllvm --riscv-enable-gather
option can be used to generate RVV strided/index load instructions. -
ZCC RVV auto-vectorization uses a default register grouping of LMUL = 8. The
-mllvm --riscv-v-register-bit-width-lmul
option allows you to specify the vector register grouping, supporting LMUL values of 1, 2, 4, and 8. -
Data locality optimization
The
-fdlo
option enables data locality optimization and must be included in both the compile and link options. Please note that this optimization is still in the testing phase. -
The default minimum trip count for RVV loop vectorization is 5.
The
-mllvm --rvv-vectorizer-min-trip-count
option allows you to specify the minimum trip count for loop vectorization. If the loop count is smaller than this value, the loop will not be vectorized. -
Delayed loop unrolling optimization can be enabled using the
-flate-loop-unroll
option. This allows ZCC to use more efficient loop unrolling algorithms during both the compilation and linking processes.
Fortran compilation options
When using ZFC, you need to add the following options to specify tartget and libraries with their path in local file system.
--target=riscv64-unknown-linux-gnu -L ./install-zcc_protected/riscv64-unknown-linux-gnu/lib -lFortran_main -lFortranRuntime -lpthread -lm
The Fortran compiler ZFC currently only supports Linux rv64imafdc. Other architectures will be supported in the future.
Link options incompatible with GCC
gcc -specs
GCC allows multiple --specs
options to specify configuration files that override default settings. For newlib, spec files like nano.specs, nosys.specs, and simihost.specs can be used.
In contrast, ZCC uses a single combined cfg file specified with the --config
option. Since ZCC does not allow multiple --config
options, the cfg file for newlib in ZCC includes nano-nosys.cfg, nano.cfg, nosys .cfg, sim.cfg and semihost.cfg.
Multi target
ZCC is a multi-target, multi-arch, multi-abi compiler. By default, ZCC will generate code for rv64imafdc/ilp32d. If you need to use ZCC to generate code for other targets, you need to use -march=<arch>
and -mabi=<abi>
at the same time. The supported arch/abi for the RISC-V architecture of ZCC are listed in Multilib.
When using ZCC to generate RV64 code, you must also specify --target=riscv64-unknown-elf
; otherwise it will cause a link error.
Code model
ZCC compiler supports the medany
and medlow
options, which are equivalent to the medium
and small
options in the GCC compiler for the RISC-V architecture.
Align arguments (RISC-V)
align target | GCC | ZCC |
---|---|---|
align function | -falign-functions=n:m:n2:m2 ;--align-functions=n:m:n2:m2 | -falign-functions=N ;-mllvm --align-all-functions=unit |
align all branch targets | -falign-labels=n:m:n2:m2 ;--align-labels=n:m:n2:m2 | -falign-labels=N(ignore) ;--align-labels=N(ignore) ;-mllvm --align-all-blocks=unit |
align loops | -falign-loops=n:m:n2:m2 ; --align-loops=n:m:n2:m2 | -falign-loops=N ;--align-loops=N(ignore) |
align branch target can only be reached by jumping | -falign-jumps=n:m:n2:m2 ;--align-jumps=n:m:n2:m2 | -falign-jumps=N(ignore) ; --align-jumps=N(ignore) ;-mllvm --align-all-nofallthru-blocks=unit |
-
ignore: ZCC will only consume this argument and do nothing.
-
N: Must be power of 2 (e.g 4 means align on 4B boundaries, -falign-functions=8 means that functions will be aligned to 8 bytes boundary.).
-
unit: Force the alignment in log2 format (e.g 4 means align on 16B boundaries,
-mllvm --align-all-functions=8
means that functions will be aligned to 256 bytes boundary.)
linker script
-
The ZCC linker does not support the
DEFINED
macro in linker scripts. For linker scripts that useDEFINED
, the following modification is currently required:- __stack_size = DEFINED(__stack_size) ? __stack_size : 2K;
+ __stack_size = 2K; -
For the support of GNU's ld for linker script's MEMORY command, refer to sourceware docs. Currently, ZCC supports most MEMORY commands, but does not support the "I" attribute. For the "I" command in ldscript, you need to make modifications as following:
MEMORY
{
- ilm (rxai!w) : ORIGIN = 0x80000000, LENGTH = 64K
- ram (wxa!ri) : ORIGIN = 0x90000000, LENGTH = 64K
+ ilm (rxa!w) : ORIGIN = 0x80000000, LENGTH = 64K
+ ram (wxa!r) : ORIGIN = 0x90000000, LENGTH = 64K
} -
The ZCC linker script does not support GNU ld's position-based cumulative operations in output sections. If you need to set an offset at the current location in the output section, you should use an absolute address. For example, replace
. = __stack_size;
with. += __stack_size;
..stack ORIGIN(ram) + LENGTH(ram) - __stack_size :
{
PROVIDE( _heap_end = . );
- . = __stack_size;
+ . += __stack_size;
PROVIDE( _sp = . );
} >ram AT>ram -
By default, ZCC compiles non-builtin sections from linker scripts into the
.data
segment. To place these sections in .bss instead, use theNOLOAD
attribute. For example, in thegcc_demosoc_ilm.ld
script fromnuclei_sdk
, you can modify the.stack
section as shown below for compatibility with ZCC.- .stack ORIGIN(ram) + LENGTH(ram) - __stack_size :
+ .stack ORIGIN(ram) + LENGTH(ram) - __stack_size (NOLOAD) :
{
PROVIDE( _heap_end = . );
- . = __stack_size;
+ . += __stack_size;
PROVIDE( _sp = . );
} >ram AT>ram
} -
If you manually use
__attribute__((section(".sec_abc")))
to place a specific initialization function pointer into a designated section, but the C/C++ source code does not directly reference the object containing the function pointer, the compiler may optimize away the object. To prevent this, you need to manually add theused
attribute, like so:__attribute__((section(".sec_abc"), used))
. This forces the compiler to retain the unused object and prevent it from being optimized away. -
Negative number representation in assembly code/inline assembly
GNU assembler (as) will perform sign extension for 32/64-bit numbers, depending on the corresponding 32/64-bit platform. For example, in the case of GNU as on RV32, it will recognize 0xFFFFF800 as -2048. However, in RV64, GNU as will throw an error for the code below. However, ZCC's assembler (ZCC as), regardless of whether it is on RV32 or RV64, treats
0xFFFFF800
as a positive number.and a0, a0, 0xFFFFF800
Furthermore, the GNU Assembler user guide clearly indicates that when working with negative numbers in assembly code, the negative sign must be placed directly before the number, as demonstrated below:
and a0, a0, -0x800
-
The lld does not support the
ALIGN_WITH_INPUT
attribute for output sections. Instead, you can use theALIGN(x)
attribute to specify alignment. For example:- .data : ALIGN_WITH_INPUT
+ .data : ALIGN(8)
{
. = ALIGN(8)
...
} -
Issue when using -M and -Map parameters at the same time in linker
-
GCC behavior dictates that the last parameter takes effect. For example, in
-Wl,-M,-Map
,-Map
takes effect, while in-Wl,-Map,-M
,-M
takes effect. -
Clang always gives priority to the
-M
option when both parameters are used together. Currently, ZCC follows the same behavior as Clang.
-
-
In the libunwind library used by ZCC, symbols such as
eh_frame_start
,eh_frame_end
,eh_frame_hdr_start
, andeh_frame_hdr_end
are referenced. When using a custom linker script, you need to include the following code to set the values of these symbols.eh_frame :
{
__eh_frame_start = .;
KEEP(*(.eh_frame))
__eh_frame_end = .;
}
.eh_frame_hdr :
{
KEEP(*(.eh_frame_hdr))
}
__eh_frame_hdr_start = SIZEOF(.eh_frame_hdr) > 0 ? ADDR(.eh_frame_hdr) : 0;
__eh_frame_hdr_end = SIZEOF(.eh_frame_hdr) > 0 ? . : 0;
Behavior different from GCC
Uninitialized local variables
Using uninitialized local variables in C/C++ results in undefined behavior because the value of uninitialized local variables may be 0, random memory values, or arbitrary values. If the source code makes use of uninitialized local variables, the assignment from different compilers might legitimately be different.
For example, in the code below, a
is an uninitialized local variable. GCC initializes a
to 0, while ZCC/Clang initializes a
to 0xFFFFFFFF
.
#include <stdio.h>
#include <stdlib.h>
void main()
{
unsigned int a, b;
b = 1;
a |= b;
printf("a %d, b %d\n", a, b);
}
Additionally, it's important to note that compilers' treatment of uninitialized local variables is not consistent. Therefore, you should not rely on compiler-specific behavior for variable initialization. For instance, in the example below, ZCC/Clang initializes a
to 0.
#include <stdio.h>
#include <stdlib.h>
void main()
{
unsigned int a, b;
b = 1;
a &= b;
printf("a %d, b %d\n", a, b);
}
To avoid undefined behavior, you can use the -Wuninitialized
flag during compilation. This enables warnings for the use of uninitialized local variables, allowing the compiler to notify you about potential issues.
Get help
If you need help or have a question with any aspect of ZCC, feel free to discuss on 1nfinite developer forum. Our team is here to provide responses and enhance your user experience.
ZCC (Commercial)
We open issue tracking system for ZCC (Commercial) users. Please report bugs on ticket page of Terapines Support.