Performance analysis

cppreg is heavily based on C++ templates and introduces various levels of abstractions compared to a traditional C CMSIS implementation. Because of C++’s zero overhead capabilities (see this for example) and when using at least some level of compiler otpimizations all the cppreg interface code is optimized away. Understandably most would question the validity of such a claim, so this document demonstrates a classic example showing that a register interface written with cppreg produced a compiled code identical to a CMSIS-based one.

Test setup

For the test example, let’s use an imaginary Cortex-M0 based micro-controller with the intention of having the UART send out a small string and then toggle two LEDs (PIN1 and PIN3) after every transmission. Why imaginary? Because a real implementation will be longer than most screens. The example is written in C using CMSIS style, and then in C++ using cppreg. We then compare the assembly outputs of both using GCC ARM with links to GodBolt so that the examples can be fiddled with.

Example peripheral

# Imaginary super simple GPIO Peripheral
(GPIO_Base) GPIO Peripheral Base Address: 0xF0A03110
    # 8 bits wide (2 bit per pin)
    # 00 = Input, 01 = Output, 10 = Reserved, 11 = Reserved
    # Our LEDS are on PIN1 and PIN3
    GPIO Direction Register: GPIO_Base + 0x00

    # 8 bits wide (1 bit per pin)
    # 0 = Do not toggle, 1 = Toggle
    GPIO Toggle Register: GPIO_Base + 0x01

# Imaginary super simple UART peripheral
(UART_Base) UART Peripheral Base Address: 0xF0A03120
    # 8 bits wide, write bytes to here to insert into the TX FIFO
    UART TX FIFO Register: UART_Base + 0x00

    # 8 bits wide, Status Register
    # BIT 0 (Enable)  = Set to enable UART, Clear to disable.
    # BIT 1 .. 2      = Reserved0, read only
    # Bit 3 (Sending) = Set to send, stays set till TX FIFO empty.
    # BIT 4 .. 7      = Reserved1, read only
    UART Status Register: UART_Base + 0x01

CMSIS Style

This snippet is based on a CMSIS style code, which makes heavy use of preprocessor macros (the #define directives) and directly maps structures onto the memory. Notice how we have to do all the binary arithmetic ourselves which is extremely error prone and tedious. Obviously the binary arithmetic could itself be simplified using macros or functions, but that will not provide more safety or clarity (let alone compile time detection of ill-formed statements and enforcement of access policies).

#include <stdint.h>

// Structs for each peripheral.
#define __IO volatile
typedef struct {
    __IO uint8_t DIRECTION; // Base + 0x00
    __IO uint8_t TOGGLE;    // Base + 0x01
} GPIO_TypeDef;
typedef struct {
    __IO uint8_t TXFIFO;  // Base + 0x00
    __IO uint8_t STATUS;  // Base + 0x01
} UART_TypeDef;

// Memory addresses where peripherals are.
#define PERIPH_BASE ((uint32_t)0xF0A03110)
#define GPIOA_BASE (PERIPH_BASE + 0x0000)
#define GPIO ((GPIO_TypeDef *) GPIOA_BASE)
#define UART_Base (PERIPH_BASE + 0x0010)
#define UART ((UART_TypeDef *) UART_Base)

void Demo_CMSIS(){

    // Make only PIN1 and PIN3 to output with masking.
    const uint16_t DIRECTION_PIN_MASK = (0b11u << (1 * 2)) | (0b11u << (3 * 2));
    GPIO->DIRECTION = (GPIO->DIRECTION & ~DIRECTION_PIN_MASK) | (1u << (1 * 2)) | (1u << (3 * 2));

    // Enable the UART.
    const uint8_t UART_STATUS_ENABLE = 0x01u;
    UART->STATUS = UART->STATUS | UART_STATUS_ENABLE;

    // Loop over forever.
    while(true){
        // Put a string into the FIFO.
        UART->TXFIFO = 'H';
        UART->TXFIFO = 'i';

        // Start sending out TX FIFO contents.
        const uint8_t UART_STATUS_SENDING = 1u << 3;
        UART->STATUS = UART->STATUS | UART_STATUS_ENABLE | UART_STATUS_SENDING;

        // Wait till the UART is done.
        while ((UART->STATUS & UART_STATUS_SENDING) != 0u) {}

        // Toggle the GPIO.
        GPIO->TOGGLE = GPIO->TOGGLE | (1u << 0) | (1u << 3);
    };
    
};

cppreg style

This is written to mimic the CMSIS example as close as possible. cppreg handles the binary arithmetic for us as well as enforcing access policies and detection of ill-formed statements at compile time. While the cppreg version is somewhat more verbose for defining the registers, it is much easier and clearer to work with at the calling site. As a side note, one of the long-term goal of this project is (at some point) to provide scripts for automatic generation based on SVD data.

#include "cppreg-all.h"

struct GPIO {

    struct GPIO_Cluster : cppreg::RegisterPack<0xF0A03110, 4u> {};

    struct Direction : cppreg::PackedRegister<GPIO_Cluster, cppreg::RegBitSize::b8, 0> {
        using PIN0   = cppreg::Field<Direction, 2u, 0u, cppreg::read_write>;
        using PIN1   = cppreg::Field<Direction, 2u, 2u, cppreg::read_write>;
        using PIN2   = cppreg::Field<Direction, 2u, 4u, cppreg::read_write>;
        using PIN3   = cppreg::Field<Direction, 2u, 6u, cppreg::read_write>;
    };

    struct Toggle : cppreg::PackedRegister<GPIO_Cluster, cppreg::RegBitSize::b8, 8> {
        using PIN0 = cppreg::Field<Toggle, 1u, 0u, cppreg::write_only>;
        using PIN1 = cppreg::Field<Toggle, 1u, 1u, cppreg::write_only>;
        using PIN2 = cppreg::Field<Toggle, 1u, 2u, cppreg::write_only>;
        using PIN3 = cppreg::Field<Toggle, 1u, 3u, cppreg::write_only>;
        using Reserved = cppreg::Field<Toggle, 3u, 4u, cppreg::read_only>;
    };
    
};

struct UART {

    struct UART_Cluster : cppreg::RegisterPack<0xF0A03120, 2u> {};

    struct TXFIFO : cppreg::PackedRegister<UART_Cluster, cppreg::RegBitSize::b8, 0> {
        using DATA = cppreg::Field<TXFIFO, 8u, 0, cppreg::write_only>;
    };

    struct STATUS : cppreg::PackedRegister<UART_Cluster, cppreg::RegBitSize::b8, 8> {
        using Enable = cppreg::Field<STATUS, 1u, 0, cppreg::read_write>;
        using Reserved0 = cppreg::Field<STATUS, 2u, 1, cppreg::read_only>;
        using Sending = cppreg::Field<STATUS, 1u, 3, cppreg::read_write>;
        using Reserved1 = cppreg::Field<STATUS, 4u, 4, cppreg::read_only>;
    };
    
};

void Demo_CPPReg(void){

    // Make the pins be an output.
    GPIO::Direction::merge_write<GPIO::Direction::PIN1>(1)
        .with<GPIO::Direction::PIN3>(1).done();

    // Enable the UART.
    UART::STATUS::Enable::set();

    // Loop over forever.
    while(true){
        // Put a string into the FIFO.
        UART::TXFIFO::DATA::write<'H'>();
        UART::TXFIFO::DATA::write<'i'>();

        // Start sending out TX FIFO contents.
        UART::STATUS::merge_write<UART::STATUS::Enable, 1>()
            .with<UART::STATUS::Sending, 1>().done();

        // Wait till the UART is done.
        while(UART::STATUS::Sending::is_set()) {}

        // Toggle the GPIO.
        GPIO::Toggle::merge_write<GPIO::Toggle::PIN0, 1>()
            .with<GPIO::Toggle::PIN3, 1>().done();
    };
    
};

Assembly results and comparison

This is how GodBolt compares the CMSIS and cppreg versions. Looking at the assembly, it’s pretty much identical, with the only difference being when to save registers onto the stack (in this case having no performance penalty). For this example (as well as others which have been tested), there is no performance penalty in the cppreg implementation; here is a comparison of the assembly outputs (obtained with GCC ARM -mthumb -Os):

Assembly Comparison