share article

Share on facebook
Share on twitter
Share on linkedin

Altera and ARM FPGA-Adaptive Performance Analysis in Product Development


Stefano Zammattio, Altera

The goal of development tools is not only to facilitate the debug of complex problems, but also to make development more efficient. Sometimes this has more to do with the convenience and usefulness of standard product features than the availability power features.

One convenience feature available in most professional debuggers is the ability to display memory-mapped SoC peripheral registers as groups of registers with names, bitfields and descriptions equivalent to those found in the peripheral’s documentation.

When developing on FPGAs implementation is more complicated. FPGA vendors normally provide a library of FPGA hardware such as encryption/decryption blocks, mathematical algorithm acceleration blocks and peripheral controllers. However, it is up to the hardware developer to decide how many of these blocks are synthesized on the FPGA and where they are located in the processor’s memory map, which means that it is not possible for the software debugger to provide peripheral register views for them out of the box.  The software developer can generate debugger peripheral description views manually, but manual editing is time consuming and error prone. The solution requires communication between the FPGA synthesis tools and the software debugger. The Altera QSys system configuration tool generates peripheral register description files for complete FPGA designs, and ARM’s DS-5 Debugger can automatically import the files and display the FPGA IP registers as if they were part of the hard processor system.

Figure 1: Automatic generation of peripheral register views and import on DS-5 Debugger

System-level performance analysis

Today, product developers emphasize debugging performance issues in an effort to gain more functionality out of the same hardware, or to reduce power consumption. Therefore, performance and power analysis tools have become a major area of focus for tools vendors.

One important reason to choose SoC devices (with integrated processor and FPGA fabric) is the ability to use FPGA hardware blocks to accelerate software. For example, FFT decoders or DES decryption algorithms in the FPGA fabric can be used to free up the processor, which can either perform another task or just sleep and save power. For these devices it is essential that tools provide visibility of the relative levels of utilization of the processors and FPGA IP blocks. The designer can then use the information to optimize the system.

Although instruction trace is used for optimizing software codecs and other performance software, for ARM applications processors running operating systems such as Linux and Android, specific analysis tools such as the ARM DS-5 Streamline performance analyser are used. The ARM DS-5 uses a Linux driver running on the target to sample information at regular intervals and every time that there is a task switch information captured is provided by counters for events such as:

·   Operating system events such as processor load.

·   Processor events such as branch mis-predictions.

·   System events, these counters enable the user to spot system-level bottlenecks.

·   Software annotations, used to report events of interest.


When this information is visualized together on a timeline, the interactions between software and hardware are made apparent to the developer.

On hybrid processor and FPGA devices the Streamline analyser can be used to simultaneously optimize hardware and software. The only infrastructure required in the hardware is memory-mapped registers that count the level of utilization of each different IP block. Streamline can then be configured to access those new counters and display their value over time, correlated with CPU activity and other system-level counters.

Figure 2: Timeline view in ARM DS-5 Streamline

Users interested in power consumption can extend Streamline with an ARM Energy Probe in order to monitor and visualize voltage and current consumption on a number of power rails on the target. On FPGA targets these power rails would normally be the ones used to power the CPU subsystem, FPGA core and FPGA I/O, but they could also monitor the main power supply of the whole product. Again, by visualizing the dependency of power consumption with software activity and system utilization, and being able to benchmark the energy consumption, developers can optimize the system for power consumption and battery life.


The new SoC devices containing ARM applications processors and FPGA fabric open up possibilities for more efficient products. The innovation in the hardware has been matched by innovation in the on-chip debug hardware, FPGA tools and software debug and analysis tools, so that developing on these devices and making the most of their power features is as easy and efficient as software development on fixed ASIC devices.


For more information and

Share this article

Share on facebook
Share on twitter
Share on linkedin

Member Login