RA Flexible Software Package Documentation  Release v5.9.0

 
Cortex-M85 Caches
Note
This overview should be considered supplementary information only. Consult with the listed references (non-exhaustive) for detailed information on the CM85 caches, Renesas caches, and cache coherency. Cache coherency can be a difficult problem to understand and solve correctly.

Changes

FSP Version Changes
5.6.0 Add information about cache incoherency with memory attribute mismatch between TrustZone security states. Add information about CCR.xC TrustZone banked behavior. Add information about cache incoherency with predefined non-cacheable sections when crossing security states. Add new Arm KB reference.

Overview

When using any type of caches in a system, coherency must be considered. A cache may contain data that is different from the backing memory (e.g. SRAM, Flash, etc.), or contain data that is different from another cache, which would make the cache incoherent. Coherency can be maintained through hardware and/or software support. For Cortex-M devices like the RA8, coherency can only be achieved through manual software management, and no automatic hardware coherency support exists.

In the default configuration for RA8 devices, FSP always enables the Code Flash Cache (FCACHE) and CM85 Instruction Cache (I-Cache) and handles the coherency of these caches where it is required. FSP does not handle FCACHE or I-Cache coherency outside of FSP. FSP optionally allows the CM85 Data Cache (D-Cache) to be enabled in the BSP configuration settings, where it is disabled by default. If the D-Cache is enabled, additional coherency concerns will arise. FSP does not currently handle any D-Cache coherency. This is a work in-progress. Drivers that will require D-Cache coherency support do not presently support it.

If D-Cache is enabled, the most common coherency concern is when data is shared between the CPU and another bus master. The D-Cache and backing memory for the shared location (usually SRAM) can become incoherent. To properly manage coherency in this situation, do one of the following:

  • Place the shared data in a non-cacheable region defined by the MPU or hardware.

    Using a non-cacheable region means there will be no cache maintenance required and no coherency issues because the shared data will not be cached. A non-cacheable region can be defined in the MPU to contain the shared data. The non-cacheable region MUST be aligned to 32 bytes and be a length multiple of 32 bytes. This is required to meet MPU alignment and length requirements. FSP predefines the .nocache and .nocache_sdram uninitialized regions for this purpose, where data can be placed (see the BSP Usage Notes reference material). FSP configures and enables the MPU with these predefined regions at startup, if they have a non-zero size. Any address outside these regions will use the cacheability attributes defined by the default system address map. As an alternative, DTCM is always hardcoded as non-cacheable by the hardware and could contain the shared data. Using the MPU should be preferred, since DTCM can only be accessed through S-AHB by other bus masters and the access may contend with CPU access to DTCM.

  • Place the shared data in a cache line aligned and padded cacheable region, and use the CMSIS cache maintenance functions.

    Using a cacheable region requires that the CMSIS cache maintenance functions be used to solve the coherency issues. The data MUST be aligned to 32 bytes and be a length multiple of 32 bytes. This is required to meet D-Cache line alignment and length requirements. If data is not aligned and fit exactly to D-Cache lines, you will create rare and difficult bugs! Use the CMSIS cache maintenance functions as required to manage the shared data coherency. When the D-Cache is enabled, FSP configures and enables the MPU at startup. Any address outside the predefined MPU regions will use the cacheability attributes defined by the default system address map. Cacheable shared data cannot be pooled into a single aligned and padded region, like the non-cacheable region. Cacheable shared data must be aligned and padded to its own data cache lines.

Less common coherency concerns include:

  • FACI erasing and programming
    • FCACHE, I-Cache, and D-Cache can become incoherent if Code Flash or Data Flash changes
  • Writing instructions into RAM using the CPU or a bus master
    • D-Cache and I-Cache can become incoherent if code in RAM changes
  • Memory attributes mismatch between the Secure and Non-secure state when TrustZone is used
    • FCACHE and D-Cache can become incoherent if the cacheability attributes of an address are mismatched between the Secure and Non-secure state
    • Shared addresses must be configured with identical cacheability attributes, either by ensuring that the Secure and Non-secure state use the default system address map for the shared address, or by configuring the same region in both the Secure and Non-secure MPU with the same attributes
    • I-Cache is usually not affected in this manner, because instruction fetch uses the banked MPU registers which correspond to the address security attribute
    • Consider that complex software designs may swap the current Secure or Non-secure MPU configuration at-will
  • MPU configuration changes
    • FCACHE, I-Cache, and D-Cache can become incoherent if the cacheability attributes of an address changes
  • SAU (Security Attribution Unit) configuration changes
    • FCACHE, I-Cache, and D-Cache can become incoherent if the security attributes of an address changes
  • Changing power modes
    • May be required to clean and invalidate caches before changing to a low power mode

These less common situations require careful consideration for cache maintenance. FSP handles some of these less common situations for FCACHE and I-Cache where it must, but it is not possible to cover all user behavior that may occur.

Other special cache concerns that are not specifically coherency related include:

  • Reading and writing from CSC, SDRAM, and Standby SRAM which use write buffers
    • If a write or read is intended to force a write buffer to flush, or to force a bus access to occur, the D-Cache may stop that from occurring by processing the read or write if the address is cacheable
    • Standby SRAM requires a different procedure than CSC or SDRAM to clear its write buffer
  • OSPI provides both a prefetch buffer for reading and a write buffer for writing on each of its channels
    • These buffers are optionally enabled individually, and may also be flushed individually
    • Cache interactions with these buffers during normal operation and during their flush procedures must be considered

CM85 Cache Features

The CM85 optionally implements an I-Cache and/or D-Cache, with several configurable properties. Renesas RA8 devices have an implementation as follows:

  • I-Cache and D-Cache are both implemented with 16 KiB size each
  • ECC is optionally enabled for the I-Cache and D-Cache with OFS1.INITECCEN
    • The OFS1_SEL register selects whether the Secure or Non-secure OFS1.INITECCEN option bit is used
    • The OFS1.INITECCEN option bit also enables ECC for the ITCM and DTCM
  • Automatic hardware cache invalidation is enabled at reset for I-Cache and D-Cache
    • This can be controlled with CACHEDBGCR.L1RSTDIS for debugging, but generally there is no use case
  • TrustZone is integrated, which means
    • CCR.xC is banked between Secure and Non-secure modes
      • Cache allocation can be controlled per the security of the address being fetched, read, or written by configuring the associated bit
      • This bit is safe from causing cache incoherency since allocation policy (allocation hints) cannot cause incoherency, and are not counted towards memory attribute mismatch
      • See the Memory Attributes and TrustZone section
    • There is a Secure and Non-secure MPU with eight regions available for each
      • Which MPU is used depends on whether an instruction fetch (uses MPU that matches security of address) or data access (uses MPU that matches current security state) is occurring
      • See the Memory Attributes and TrustZone section
    • D-Cache maintenance from the Non-secure state is promoted to clean type maintenance since both Secure and Non-secure data may be cached, which prevents Secure data from being modified by the Non-secure state

RA8 Cache Background Information

CM85 Caches

The I-Cache and D-Cache are implemented inside of the CM85 by Arm. The CM85 has a Harvard design, where instruction fetches and data reads/writes are performed on separate interfaces. The I-Cache can only perform lookup and allocation for instruction fetches. The D-Cache can only perform lookup and allocation for data reads and writes.

Whether lookup of an address occurs in a cache depends on:

  1. Cache lookups are enabled in the MSCR register.
  2. The Shareability (Non-shareable) (D-Cache Only), Inner Cacheability (Cacheable), and Memory Type (Normal) as defined by the System Address Map or the Arm MPU (S or NS).
  3. Any hardware cacheability caveats, where some addresses can never lookup in a cache, like ITCM and DTCM.

Whether allocation of an address occurs in a cache depends on:

  1. Cache allocations are enabled in the CCR register (S or NS).
  2. Cache lookups are allowed for that address.
  3. Inner Cacheability (Read and/or Write Allocate) as defined by the System Address Map or the Arm MPU (S or NS).

The MSCR register controls whether lookups may occur for a cache, while the CCR register (S or NS) controls whether allocation may occur for a cache.

The system address map and Arm MPU (S or NS) define the memory type, shareability attributes, and cacheability attributes for an address. All three memory properties combine to define whether an address may lookup and allocate within one of the caches. The CM85 defines specific behaviors for some architecturally implementation defined or undefined behaviors regarding combinations of these three properties.

Both caches accept Inner cacheability attributes from the system address map and the Arm MPU (S or NS). The support of cacheability attributes varies depending on the cache and its configuration.

The I-Cache and D-Cache have different associativity, supported cache policies, supported memory attributes, and supported shareability attributes. Because of these variations, the behavior of I-Cache and D-Cache will be different even when accessing the same address.

Even when the Arm MPU (S or NS) is disabled, the default system address map will provide the memory type, shareability attributes, and cacheability attributes for an address. If the Arm MPU (S or NS) is enabled with no regions defined, it may be configured to use the default system address map as a background region.

Arm prescribes specific procedures for enabling and disabling the caches, and other cache maintenance operations that must be followed. There are I-Cache and D-Cache specific Arm architectural instructions that are used to perform these procedures. The Arm CMSIS library provides functions to perform these cache operations.

Renesas FCACHE

The FCACHE is implemented by Renesas and performs instruction prefetches, caches instruction fetches, and caches data reads from the CPU and other bus masters to Code Flash memory.

Cache maintenance for FCACHE is conducted through its peripheral registers.

In general, whether for the CM85 I-Cache or D-Cache, or Renesas FCACHE, cache maintenance is used to synchronize a given cache with the backing memory, and to synchronize caches with each other.

Cache Maintenance

The correct maintenance sequence must be followed to avoid caches reading stale data from each other or from backing memory.

Unlike the D-Cache, the I-Cache is a read-only interface which cannot be written with new instructions by the CPU. The only way for the CPU to see modified instructions while using I-Cache is through invalidation. Thus if instructions change, I-Cache maintenance is always required whether FCACHE maintenance is needed or not.

The D-Cache is a read and write interface. The CPU will write modified data into the D-Cache (if cacheable, and other properties are met), so any read back from the D-Cache will have the latest data. Thus if data changes, it may be necessary to perform D-Cache maintenance and possibly FCACHE maintenance depending on whether the data is shared between the CPU and another bus master or if the data is changed by FACI.

The word "shared" here does not mean "Shareability" as defined as an Arm memory attribute.

I-Cache and D-Cache cache lines are aligned to 32 bytes and are 32 bytes in length each. For D-Cache specifically, where a cache line may become dirty when write-back is used, cacheable shared data written by a bus master cannot be allowed to mix on a write-back cache line with data that is unrelated. For simplicity, follow the most conservative rule of aligning and padding cacheable shared data to meet D-Cache line requirements.

CM85 has an erratum with write-back when D-Cache is enabled. FSP v5.3.0 has added the recommended workaround of using MSCR.FORCEWT to force all D-Cache access to write-through, even if an access specifies write-back. Developers should write software as if write-back is being used for full compatability with data cache, which includes the above alignment and padding requirement.

This guide describes common maintenance scenarios including write-back and write-through. The write-through write policy does not obviate the need for using memory barriers on the CM85. The use of memory barriers is out-of-scope for this document.

FSP handles some of these maintenance scenarios during startup and in the Flash HP driver.

Typical Cache Maintenance Scenarios

Note
Whether full or address-based cache maintenance can or should be used depends on the capabilities of the target cache (FCACHE has no address-based maintenance capability) and the particular application scenario.

I-Cache, FCACHE Enabled (Default Configuration)

Instructions Change

  • Instructions Not Cacheable
    • No Maintenance Required (Not Cacheable)
  • Instructions Cacheable
    • FACI Code Flash Program or Erase
      1. Invalidate FCACHE
      2. Invalidate I-Cache
    • FACI Data Flash Program or Erase
      1. Invalidate I-Cache
    • In RAM or Other
      • Written by CPU
        1. Invalidate I-Cache
      • Written by Bus Master
        1. Invalidate I-Cache

Data Change

  • Data Not Cacheable
    • No Maintenance Required (Not Cacheable)
  • Data Cacheable
    • FACI Code Flash Program or Erase
      1. Invalidate FCACHE
    • FACI Data Flash Program or Erase
      • No Maintenance Required (D-Cache Disabled)
    • In RAM or Other
      • No Maintenance Required (D-Cache Disabled)

Instructions and Data Change

  • Instructions and Data Not Cacheable
    • No Maintenance Required (Not Cacheable)
  • Instructions and Data Cacheable
    • FACI Code Flash Program or Erase
      1. Invalidate FCACHE
      2. Invalidate I-Cache
    • FACI Data Flash Program or Erase
      1. Invalidate I-Cache
    • In RAM or Other
      • Written by CPU
        1. Invalidate I-Cache
      • Written by Bus Master
        1. Invalidate I-Cache

D-Cache, I-Cache, FCACHE Enabled

Data Change

  • Data Not Cacheable
    • No Maintenance Required (Not Cacheable)
  • Data Cacheable
    • FACI Code Flash Program or Erase
      1. Invalidate FCACHE
      2. Clean And Invalidate D-Cache
    • FACI Data Flash Program or Erase
      1. Clean and Invalidate D-Cache
    • In RAM or Other
      • Data Not Shared
        • No Maintenance Required (Not Shared)
      • Data Shared
        • Written by CPU
          • Write Back

            Area must be aligned and padded to D-Cache line requirements.

            1. Clean D-Cache, After CPU Write
          • Write Through
            • No Maintenance Required (Write Through)
        • Written by Bus Master
          • Write Back

            Area must be aligned and padded to D-Cache line requirements.

            • Buffer to be Written is Dirty (e.g. Stack or Heap Allocated Buffer May Be Dirty Already)
              1. Invalidate D-Cache, Before and After Bus Master Write
            • Buffer to be Written is Clean
              1. Invalidate D-Cache, After Bus Master Write
          • Write Through
            1. Invalidate D-Cache, After Bus Master Write

Instructions Change or Instructions and Data Change

  • Instructions Not Cacheable or Instructions and Data Not Cacheable
    • No Maintenance Required (Not Cacheable)
  • Instructions Cacheable or Instructions and Data Cacheable
    • FACI Code Flash Program or Erase
      1. Invalidate FCACHE
      2. Clean And Invalidate D-Cache
      3. Invalidate I-Cache
    • FACI Data Flash Program or Erase
      1. Clean And Invalidate D-Cache
      2. Invalidate I-Cache
    • In RAM or Other
      • Written by CPU
        • Write Back

          Area must be aligned and padded to D-Cache line requirements if shared with a bus master.

          1. Clean D-Cache, After CPU Write
          2. Invalidate I-Cache
        • Write Through
          1. Invalidate I-Cache
      • Written by Bus Master
        • Write Back

          Area must be aligned and padded to D-Cache line requirements as we assume it is shared with CPU here.

          • Buffer to be Written is Dirty (e.g. Stack or Heap Allocated Buffer May Be Dirty Already)
            1. Invalidate D-Cache, Before and After Bus Master Write
            2. Invalidate I-Cache
          • Buffer to be Written is Clean
            1. Invalidate D-Cache, After Bus Master Write
            2. Invalidate I-Cache
        • Write Through
          1. Invalidate D-Cache, After Bus Master Write
          2. Invalidate I-Cache

Cache Functions and Macros

Renesas BSP Configuration Macros

Macro Purpose Notes
BSP_CFG_DCACHE_ENABLED Defaults to zero (disabled). If defined and non-zero, the FSP startup code in system.c will configure several predefined non-cacheable sections in the MPU if they are of non-zero size, enable the MPU, and enable the D-Cache. This is normally configured in e2 Studio under the BSP->Cache settings->Data cache properties panel for the project.
BSP_CFG_ROM_REG_OFS1_INITECCEN Defaults to zero (disabled). Sets the value of OFS1.INITECCEN for BSP_CFG_ROM_REG_OFS1, which controls whether ECC is enabled for caches and TCM. This is normally configured in e2 Studio under the BSP->OFS1 register settings->Tightly Coupled Memory (TCM)/Cache ECC properties panel for the project.

CMSIS 6 I-Cache Functions

Function Purpose Notes
SCB_EnableICache If I-Cache allocations are not already enabled, invalidate the entire I-Cache then enable I-Cache allocations with CCR.IC. Will do nothing if I-Cache allocations are already enabled. FSP automatically enables the I-Cache at startup by directly setting CCR.IC instead of using this function.
SCB_DisableICache Disable I-Cache allocations with CCR.IC, then invalidate the entire I-Cache.
SCB_InvalidateICache Invalidate the entire I-Cache. This is safe to use at any time, because cache lines in the I-Cache can never be dirty. Used after modifying instructions anywhere in memory (e.g. Flash, RAM). FSP calls this function after initializing the predefined RAM code section during startup, and when exiting Code Flash program or erase mode in the Flash HP driver.
SCB_InvalidateICache_by_Addr Loop to invalidate the instructions in the I-Cache, starting at a particular address and extending for the specified length in bytes. This is safe to use at any time, because cache lines in the I-Cache can never be dirty. Can be used to more efficiently invalidate instructions at specific addresses. Will invalidate in increments of cache lines (32 bytes).

These functions can be safely interrupted and do not need to be guarded by critical sections. However, depending on the structure of the application logic, guarding the functions may be necessary. This must be analyzed for an individual scenario. See the CMSIS 6 API reference material for further information.

CMSIS 6 D-Cache Functions

Function Purpose Notes
SCB_EnableDCache If D-Cache allocations are not already enabled, loop to invalidate the entire D-Cache then enable D-Cache allocations with CCR.DC. Will do nothing if D-Cache allocations are already enabled. FSP automatically calls this function at startup if BSP_CFG_DCACHE_ENABLED is defined and non-zero.
SCB_DisableDCache Disable D-Cache allocations with CCR.DC, then loop to clean and invalidate the entire D-Cache.
SCB_InvalidateDCache Loop to invalidate the entire D-Cache. This function should generally not be used, since no use case typically exists to invalidate the entire D-Cache.
SCB_CleanDCache Loop to clean the entire D-Cache.
SCB_CleanInvalidateDCache Loop to clean and invalidate the entire D-Cache.
SCB_InvalidateDCache_by_Addr Loop to invalidate the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. Will invalidate in increments of cache lines (32 bytes).
SCB_CleanDCache_by_Addr Loop to clean the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. Will clean in increments of cache lines (32 bytes).
SCB_CleanInvalidateDCache_by_Addr Loop to clean and invalidate the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. Will clean and invalidate in increments of cache lines (32 bytes).

These functions can be safely interrupted and do not need to be guarded by critical sections. However, depending on the structure of the application logic, guarding the functions may be necessary. This must be analyzed for an individual scenario. See the CMSIS 6 API reference material for further information.

Cache Details

I-Cache Details

FSP I-Cache and FCACHE Behavior

Because of the simplicity of the I-Cache and FCACHE relative to the D-Cache and the critical instruction execution performance enhancement that they provide, FSP always enables the I-Cache and the FCACHE. This is not configurable.

FSP automatically enables the I-Cache at startup for CM85 by directly setting CCR.IC. This method is used instead of the CMSIS 6 function, so that the I-Cache, branch prediction, and the low-overhead branch (LOB) extension may simultaneously be enabled. The automatic hardware cache invalidation of the CM85 ensures that cache lookups, allocations, and cache maintenance are no-op until the invalidation is finished, so immediately enabling the I-Cache is safe to do.

FSP invalidates the I-Cache using the CMSIS 6 functions:

  • After initialzing the predefined RAM code section during startup
  • After initializing the SAU for a Secure TZ application
  • When exiting Code Flash program or erase mode in the Flash HP driver

The FCACHE is a Renesas cache, not an Arm cache, and it is controlled through its separate peripheral registers.

User Required I-Cache Maintenance

If instructions have changed outside of the control of FSP, it is user responsibility to perform I-Cache maintenance. This means instructions stored in any cacheable location, including internal flash, internal RAM, external flash, external RAM, etc. It is recommended to use the CMSIS 6 functions to perform I-Cache maintenance.

Users must also consider the interactions that D-Cache has with instruction modifications. For example, if the modified instructions are written to a cacheable location while D-Cache is enabled (e.g. RAM), those data writes may be cached. The D-Cache will need to be cleaned to guarantee that all data writes have been written back to guarantee their visibility to the I-Cache. D-Cache maintenance will also be needed if instructions change in Code or Data Flash via FACI and are cacheable, since D-Cache will cache instructions as data.

There is no hardware mechanism between the I-Cache and D-Cache in which they automatically share coherency, so coherency must be manually maintained by software as required.

D-Cache Details

FSP D-Cache Behavior

The D-Cache is a cache with more complex interactions than the I-Cache. Thus, FSP leaves the D-Cache disabled by default on RA8 projects. It can be enabled in e2 Studio under the BSP->Cache settings->Data cache properties panel for the project.

Presently, FSP does not support any D-Cache functionality except:

  • Configuring the Arm MPU (S and NS) with two predefined no-cache sections if they are non-zero size
    • One in SRAM
    • One in SDRAM
  • Enabling the Arm MPU (S and NS) after configuration
  • Enabling D-Cache allocations (S or NS) with the CMSIS function

No FSP drivers are currently compatible with D-Cache enablement. This compatibility is a work in-progress.

User Required D-Cache Maintenance

Presently, D-Cache usage is fully in the realm of user responsibility. The user must perform all D-Cache maintenance as required, or must store data accordingly in the predefined non-cacheable regions or otherwise.

Affected Bus Masters

Coherency must be considered for these bus masters:

  • CEU
  • DMAC
  • DRW
  • DTC
  • EDMAC
  • GLCDC
  • MIPI

Other Interactions that are not Bus Masters

Coherency must be considered for interactions with:

  • FACI
  • CSC
  • SDRAM
  • Standby SRAM
  • OSPI
  • Code in RAM (I-Cache and D-Cache are not automatically coherent)

FSP Predefined No-Cache Sections

Warning
The predefined non-cacheable sections are only understood as non-cacheable within their respective security state when TrustZone is used. This is due to the MPU configuration done by FSP and how Armv8-M chooses the MPU which matches the current security state for determining the memory attributes of a data access. Cache incoherency will occur if references to these sections are passed between security states. For example, passing a reference from the Non-secure application non-cacheable section to the Secure application will result in the Secure application treating the location as cacheable.

The .nocache and .nocache_sdram sections are predefined for GCC, LLVM, and IAR compilers. These same sections exist for AC6 as .bss.nocache and .bss.nocache_sdram because of special naming restrictions with AC6 and uninitialized sections. These sections are uninitialized for all compilers, despite AC6 requiring a prefix of .bss.

Anything placed within them will be non-cacheable. Instruction fetches and data reads or writes to these sections will never lookup or allocate in their respective caches.

The FSP startup code configures these sections as non-cacheable using the MPU during startup, if the D-Cache is enabled via the BSP configuration. Otherwise, they are not configured by FSP in the MPU if the D-Cache is disabled. The predefined sections are aligned to 32 bytes and are padded to a minimum of 32 bytes in length. This meets both MPU region alignment and length requirements, and cache line alignment and length requirements. The MPU and cache line alignment and length requirements protect against inadvertent mixing of cacheable and non-cacheable data.

Other Information

Cache Maintenance when MPU Configuration Changes

If the Secure and/or Non-secure MPU configuration is changed, and the cacheability of an address changes, cache maintenance is required to synchronize the caches with the new memory attributes. If this is not done, a newly non-cacheable address may be left in the cache, and behavior when accessing the address is considered undefined.

Cache Maintenance when SAU Configuration Changes

If the SAU configuration is changed, and the security attributes of an address changes, cache maintenance is required to synchronize the caches with the new security attributes. If this is not done, cached data will be desynchronized with the new security attributes and may result in undefined behavior.

Memory Attributes and TrustZone

Warning
If data that is shared between the Secure and Non-secure application does not use the same memory attributes while being accessed because of a memory attribute mismatch, it will cause cache incoherency. See the Arm knowledgebase article ka001216 and the Armv8-M Architecture Reference Manual for information on MPU selection behavior and memory attribute mismatch.

Cache Maintenance and TrustZone

Warning
If cacheable shared data is improperly structured by the Secure application and is not aligned and padded to match D-Cache line requirements, a clean and invalidate by set/way of the D-Cache by the Non-secure application, or an automatic D-Cache eviction by the Non-secure application, will cause data to be destroyed for the Secure application. This consequence is additional to the bugs that the Secure application may trigger itself by improper data layout, D-Cache maintenance, and automatic D-Cache eviction.

The System Address Map and the MPU

The Armv8-M Architecture Reference Manual specifies a default system address map that defines the memory regions of the architecture and their various properties.

These properties include the memory type, shareability attributes, and cacheability attributes for the regions. When the MPU is disabled, this default system address map provides the system with default attributes for instruction fetches, data reads, and data writes to and from addresses. When the MPU is enabled, it can be used to override the default system address map entirely, or both may be used together by setting the MPU_CTRL.PRIVDEFENA bit. This bit allows instruction fetches or data reads and writes that do not correspond to a configured MPU region to hit the default system address map as a background region instead, so long as the access is Privileged. FSP does not support Unprivileged execution, so it always assumes Privileged execution state. Allowing the default system address map as a background region is the method that FSP uses to provide the predefined no-cache sections, by configuring the MPU for the no-cache sections while allowing all other memory accesses to rely on the default system address map. Configuring an MPU region involves specifying a 32 byte aligned start address and an inclusive ending address, and also specifying the various memory attributes of the region. The MPU region beginning address register will mask downward to align to a 32 byte boundary. i.e (address & ~0x1F) The MPU region ending address register will OR upward with 0x1F for the inclusive ending boundary. i.e. (address | 0x1F) Thus, the minimum size of an MPU region is 32 bytes and the size may only increase in 32 byte increments.

See the Armv8-M Memory Model and Memory Protection User Guide in the references section for a high-level introduction, and the Armv8-M Architecture Reference Manual for details.

Speculative Instruction Fetching and Data Reads

The CM85 may, with no deliberate software instruction, speculatively fetch instructions or read data from any memory location. Upon doing so, the instruction fetch or data read may enter the respective cache. The purpose of this speculative behavior is to predict the next instructions or data to be fetched, read, or written, which increases performance if the prediction is correct. This may cause instructions or data to unexpectedly appear in cache, so speculation must be considered when solving for cache coherency.

Cache Eviction

At any time, the I-Cache or D-Cache may evict cache lines. For I-Cache, this means invalidation of the evicted line. For D-Cache, this means cleaning and invalidation of the evicted line, where cleaning occurs if the cache line is dirty. D-Cache eviction of dirty lines may cause data to be unexpectedly written out to backing memory when write-back is used, and this must be considered when solving for cache coherency. For D-Cache aligned and padded buffers derived from areas like the stack or heap, one or more of the associated cache lines may already be dirty and require cleaning and/or invalidation before being used by a bus master.

Example of D-Cache Eviction and Speculative Read Dangers

The CM85 will provide a SRAM buffer to the DMAC, the DMAC will write to the buffer, and the CM85 will read from the written buffer. I-Cache, D-Cache, and FCACHE are enabled. SRAM exists in the same "SRAM" region defined by the default system address map in the Armv8-M Architecture Reference Manual. SRAM is Normal memory, write-back, write-allocate, read-allocate, and non-shareable by the default system address map attributes.

The correct way to solve coherency in this situation using the two recommended solution options is:

  1. The MPU is used to configure a non-cacheable region where the SRAM buffer is placed.
    • The MPU region is correctly aligned and padded to meet start and end address alignment requirements, and region attributes are correctly configured.
    • No additional effort is required.
    • FSP provides the predefined .nocache section that meets these requirements.
  2. The SRAM buffer is left to its default attributes, making it cacheable.
    • Regardless of whether write-back or write-through is used, the buffer start must be aligned to a cache line and the buffer must be a length multiple of cache line size.
      • Unrelated data can never be mixed on cache lines if write-back is used. It is best to follow this strict guideline even when write-through is used.
    • The cache lines of the buffer may already be dirty, especially if the buffer is allocated in a stack or heap.
    • If write-back is used, cache maintenance is conducted in order.
      • Invalidate buffer.
      • CM85 provides buffer to DMAC and starts DMAC.
      • DMAC writes to buffer.
      • CM85 waits for DMAC to complete.
      • Invalidate buffer.
      • CM85 reads from buffer.
    • The first invalidation is to remove dirty lines, which may already exist from stack or heap allocation. The data does not need to be written back and can be discarded without a clean.
      • If this is not done, an eviction (effectively an automatic clean and invalidate by the hardware) will cause stale data to be written back to the buffer and destroy newly written DMAC data.
    • The second invalidation is to remove speculatively read cache lines, which may have been cached before the DMAC write completed.
      • If this is not done, the CM85 will read stale data from the buffer that has been prematurely cached.
    • Write-through may also use this sequence, although the first invalidate should be a no-op since no lines can be dirty and no stale data should be written back by an eviction.
    • The CMSIS cache maintenance functions should be used to perform cache maintenance, since they include necessary memory barriers.

The second solution is shown in the D-Cache Enabled scenarios here where write-back is used and a bus master performs writing.

Cache ECC with FSP

By default, FSP disables ECC for the caches and TCM with OFS1.INITECCEN. For best performance, it is recommended to keep ECC for cache and TCM disabled. If enabling is desired, please consult the reference material to understand the consequences of enabling ECC for cache and TCM, which are too numerous to describe here. The automatic hardware cache invalidation performed by the CM85 is compatible with ECC.

MPU Cacheability Attributes

  • Cacheable or Non-Cacheable
  • Allocation Policies
    • Read Allocate
    • Write Allocate
  • Write Policy
    • Write Back or Write Through
  • Transient or Non-Transient

For D-Cache, Shareable or Non-Shareable also affects whether an address is Cacheable or Non-Cacheable. A Shareable address is forced to Non-Cacheable for D-Cache. I-Cache is not influenced by the Shareability properties and will always follow the MPU cacheability attributes. The Transient attribute is of limited utility and can mostly be ignored. Clean cache lines that are marked Transient are preferred for eviction before clean cache lines marked Non-Transient. Dirty cache lines whether marked Transient or Non-Transient are evicted with the same priority.

See the CM85 Technical Reference Manual reference material for further information.

Cache Behavior with CCR and MSCR

x = [I, D]

CCR.xC (S or NS) MSCR.xCACTIVE Behavior
1 1 Allocate, Lookup
0 1 No Allocate, Lookup (Reset Behavior)
X 0 No Allocate, No Lookup

This behavior is applicable to Cortex-M55 and Cortex-M85. If you have previous experience with a Cortex-M7 device, this cache behavior is different since MSCR.xCACTIVE bits were introduced for CM55 and CM85. No CM7 or CM55 core is offered by any current RA devices. The new addition of the MSCR.xCACTIVE bits allow for cache power control, and by allowing a third cache behavioral state of lookups without allocation, cleaning the D-Cache after disabling it becomes less error prone since dirty cache lines cannot be made stale before being cleaned, by writes occurring after D-Cache is disabled like on CM7. The MSCR.xCACTIVE bits have a reset value of 1, so the caches are powered by default and lookups are possible. Until the automatic hardware cache invalidation which begins after reset finishes, lookups and allocations do not occur even if CCR.xC is set, and cache maintenance operations are no-op. The MSCR.xCACTIVE bits should generally never be cleared to 0.

Cache Errata

Consult the latest Renesas Technical Updates (TU) and Arm Cortex-M85 Errata documents.

These are example errata to demonstrate the possibility of issues with cache usage at the time of this writing.

Cortex-M85 AT640 and Cortex-M85 with FPU AT641
Software Developer Errata Notice
Date of issue: April 16, 2024
Document version: 14.0
Document ID: SDEN-2236668
2682779
After deactivating the instruction cache, self-modified code might not be executed correctly
Fault Type: Programmer Category C
Fault Status: Present in r0p0, r0p1, r0p2. Fixed in r1p0
3175626
AXI hang due to dependency between read data channel and write response channel
Fault Type: Programmer Category B
Fault Status: Present in r0p0, r0p1, r0p2, r1p0. Fixed in r1p1
3190818
Under limited circumstances, LDM to normal non-cacheable AXI location cannot complete
Fault Type: Programmer Category B
Fault Status: Present in r0p0, r0p1, r0p2, and r1p0. Fixed in r1p1

Currently available RA8D1, RA8M1, and RA8T1 devices use the r0p2 variant of the core, so they are affected by these errata.

Erratum 2682779 should not require a workaround, since I-Cache will never be powered off in most circumstances.

FSP added workarounds for 3175626 and 3190818 in v5.3.0.

References

Note
Cross-reference documents from multiple sources and consult with colleagues and other support channels for maximum confidence.

Renesas

Generally, consult these categories of documents for the most recent and further information than this overview may provide.

  • RA Datasheets
  • RA Hardware User Manuals (HWM, HWUM, UM)
  • RA Application Notes (AN)
  • RA Knowledge Base Articles (KB)
  • RA Technical Updates (TU)
  • RA Example Projects

RA8D1

  1. RA8D1 Product Page
  2. RA8D1 Datasheet
  3. RA8D1 User's Manual: Hardware

RA8M1

  1. RA8M1 Product Page
  2. RA8M1 Datasheet
  3. RA8M1 User's Manual: Hardware

RA8T1

  1. RA8T1 Product Page
  2. RA8T1 Datasheet
  3. RA8T1 User's Manual: Hardware

BSP Usage Notes

  1. Limited D-Cache Support
  2. Non-Cacheable Buffer Placement Example

Arm

Note
Arm links appended with "latest" may not actually resolve to the most recent document, because of issues with the Arm documentation website. Always check that the document you are accessing is truly the most recent version using the version drop-down list box.

Armv8-M and Armv8.1-M Architectures

  1. Armv8-M Architecture Reference Manual
  2. Armv8-M Memory Model and Memory Protection User Guide
  3. Armv8-M Exception Model User Guide
  4. Which MPU configuration is used in a Trustzone enabled Armv8-M system (ka001216)

Cortex-M85

  1. Cortex-M85 Product Page
  2. Arm Cortex-M85 Processor Technical Reference Manual
  3. Arm Cortex-M85 Processor Devices Generic User Guide
  4. Cortex-M85 AT640 and Cortex-M85 with FPU AT641 Software Developer Errata Notice
  5. Arm Cortex-M85 Processor Software Optimization Guide

CMSIS 6

  1. CMSIS 6 GitHub Repository
  2. CMSIS 6 Documentation
  3. CMSIS 6 MPU API for Armv8-M
  4. CMSIS 6 Cache API