Performance analysis and comparison
ZStudio has built-in performance analyzer & pipeline visualization tool --- ZProf, which helps you diagnose the function-level to micro-architecture level issues. With the ZProf, you can quickly identify bottlenecks and design optimization.
Profiling
This topic is for the use of ZProf in ZStudio. About the Demo of performance analysis, please refer to AccInst.
Run profiling sessions
You need to successfully build the project before launching profiling sessions. You can run with default configuration by selecting Profiling > Start profiling.
Once starting profiling, you need to wait for the "The profiling has finished!" prompted from message window and avoid to executed again. The profiling result will open in a new editor.
You can also change the profiling configuration by selecting Profiling > Configuration.
Profiling Configuration
-
Choose Virtual Board: You can choose builtin virtual boards or add your custom board.
-
Simulator Virtual Board: Select the builtin target virtual board from drop-down menu.
-
Configuration File: Specify the directory of custom target board configuration file.
-
Display Statistic: Choose whether to print performance statistic (instructions, cycles, branch count, branch misses and IPC ) of the executable program in console.
-
-
Profiling Output Directory: If not specified, the default output directory is
{project_name}.zprof
. It can be found under zprof-output node in navigator. -
Simulator Arguments: Specify the arguments passed to the simulator. No parameters are passed by default.
-
Profiling Override: Choose whether to allow replacing the original profiling result with the latest one for the same project's identical configuration performance analysis. If not, each new result obtained will be appended with a timestamp.
-
Program Arguments: Specify the arguments passed to the program. No parameters are passed by default.
Profiling result
Profiling result cover important performance metrics used to track and investigate bottlenecks during project execution, including Program Summary and Function Summary.
The Program Summary gathers essential metrics during the program's execution, as listed below:
Performance metrics | Description |
---|---|
Total Cycle | The total number of clock cycles expended during program execution. |
Total Instruction | The total number of instructions executed during the program’s runtime. |
Instruction Per Cycle, IPC | It represents the average number of instructions executed per clock cycle. A higher IPC indicates better performance. |
Total Instruction Cache Misses | The number of times the processor cannot find the required instructions in the Level 1 instruction cache. |
Total Data Cache Misses | The instances where required data cannot be found in the Level 1 data cache. |
Total Branch Direction Prediction Misses | Occurs when the processor fails to accurately predict the outcome of branch instructions in the program. |
Total Branch or Jump Prediction Misses | Indicates the number of times the processor inaccurately predicts the target address during the execution of branch or jump instructions. |
The Function Summary lists all executed functions for the project being profiled. The function summary data can be sorted into ascending or descending order by clicking a column header. For example, by clicking the column header Self Cycle, the functions in this view are sorted according to their Self Cycle.
According to the Function Summary, you can identify the bottleneck function. Click to get where it is in source code.
For a detailed list of columns in Function Summary, please refer to table.
Available columns | Description |
---|---|
Functions | name of all executed functions for the project |
Address | memory address of function |
Exec Times | total number of times a function is called up |
Self Cycle | Self Cycle Count - total number of cycles per function |
Total Cycle | Total Cycle Count - total number of cycles per function per call plus total number of cycles per function’s children function per call |
Self Inst | Self Instruction Count- total number of instructions per function |
Total Inst | Total Cycle Count - total number of cycles per function per call plus total number of cycles per function’s children function per call |
Cycle Percentage | Self cycle percentage |
Self Instruction Cache Misses | Instruction cache misses during the function's execution |
Total Instruction Cache Misses | Instruction cache misses including those from the function's child functions |
Self Data Cache Misses | Data cache misses during the function's execution |
Total Data Cache Misses | Data cache misses including those from the function's child functions |
Self Branch Direction Prediction Misses | Branch direction prediction misses during the function's execution |
Total Branch Direction Prediction Misses | Branch direction prediction misses including those from the function's child functions |
Self Branch Or Jump Prediction Misses | Jump or branch prediction misses during the function's execution |
Total Branch Or Jump Prediction Misses | Jump or branch prediction misses including those from the function's child functions |
Code coverage
ZStudio provides code coverage information that helps identify which parts of the code are frequently executed and consume a significant amount of the overall execution time.
Once you jumped from the function summary or pipeline to the source code, the line execution counters can be turned on in the source editor by selecting Profiling > Code Coverage Analysis.
You need to build project with debug profile (-g) to get the code coverage date.
You can also get a quick look for source profiling information in editor. By clicking Enter Pipeline in source code, you can navigate to the pipeline view.
To turn off the source profiling information, press
Control+Shift+P to invoke Command palette, type and select Code Editor: Toggle Profiling Summary
. To turn off the line execution counters, type and select Code Editor: Toggle Code Coverage
in the Command palette.
Micro arch pipeline
The pipeline view can visualize performance and resource bottleneck. It provides pipeline view of instructions and functional units and resource view corresponding to instruction usage.
You can open pipeline view by:
-
In the main menu, select Profiling > Micro Arch Pipeline.
-
Click button in Function Summary.
-
Enter Pipeline in source code
The pipeline view distinguish stall bubbles with different colors for data dependency. You can refer to the table below.
Stall Bubbles | Description |
---|---|
ICache Miss | Instruction cache miss. |
DCache Miss | Data cache miss. |
Pipeline Flush | CPU stops the currently running instruction execution and clears the instructions due to branch misprediction or exception. |
Data Hazard | Data Hazard; an instruction depends on the result of a previous instruction that has not yet completed. |
Structural Hazard | Structural Hazard: a hardware resource conflict (e.g., multiple instructions accessing the same resource simultaneously) causes pipeline stalls, resulting in a bubble. |
Stalled by Previous | There is a stall in the previous pipeline stage. |
The Instruction Pipeline view focuses on the execution process of instructions and identifies potential performance issues during instruction execution. You can select in the dropdown box to view different types of bubbles, using the forward and backward buttons to quickly navigate through bubbles of the same type. The search box can quickly locate a specific PC address and cycle.
In Instruction Pipeline view, click the address or assembly code of instruction to get where it is in source code.
The Resource Pipeline view focuses on describing the relationships between computer hardware resources and the flow of data. It provides a hardware-module level representation of resource utilization efficiency and data flow patterns.
You can customize bubble color or decide whether synchronize horizontal and vertical scrolling in instruction pipeline in Preference > Pipeline Settings.
Profiling comparison
This topic describe how to compare profiling results in the Profiling Comparison view. You can compare results to see how changes that you've made to source code, microarchitecture or build configuration affect performance. Or, if you run profiling on specific virtual boards, you can compare those results to results for the different virtual boards to see which board has better performance.
By default, the results from each profiling session that you run will be displayed in side bar of Profiling Comparison view. You can click (Refresh) button to get latest data. If a long list of results is available, you can filter the results to quickly find the result that you interested-in.
To compare results:
-
On the main menu, select View > Profiling Comparison to invoke the comparison tool.
-
In the profiling comparison side bar, you can right-click one profiling result and set it as base data.
-
In the profiling comparison side bar, you can use Shift or Ctrl to select multiple profiling results, right click on one, and select Compare or use button to achieve quick compare.
-
You can also click the button and change compare items and data visualization in invoked window.
Profiling Comparison Configuration
- Select the data visualization of the Program Performance Comparison: select at least one form between chart and table.
- Select items to compare. The default items are Instruction Count, Cycle Count and Instruction Per Cycle. You can select by yourself.
- Customize color indicators for compared profiling data: You can specify the color you like for different data. The profiling result set as base data are highlighted in fixed color (blue by default).
-
When you are happy with the configuration, Select profiling results again and click the(Compare) button to open the profiling comparison view in editor.
You can change te base data from drop-down box on up-right corner. The base data (highlighted in blue by default) is applied both in program performance comparison and function summary comparison.
Program performance comparison
Program Performance provides important indications used to track bottlenecks. To filter compare items, change them in Profiling Comparison Configuration.
The Program Performance chart include three vertical bars according to different metrics units. Horizontal axis is metrics and vertical axis is the data of metrics. You can keep Legend button on to observe profiling result with their colors.
Program Performance Table provides change of indicators according to the Base data; The color differs the change direction, green means decrease while red means increase compared to base data.
Function summary comparison
The Function Summary Comparison display metrics summary of profiling results comparison in terms of functions. The function summary data can be sorted into ascending or descending order according base data by clicking a column header.
You can filter results to show only the key or desired function. Input the function name in search box and press Enter.