![]() |
RA Flexible Software Package Documentation
Release v5.9.0
|
|
FSP Version | Changes |
---|---|
5.6.0 | Add information about cache incoherency with memory attribute mismatch between TrustZone security states. Add information about CCR.xC TrustZone banked behavior. Add information about cache incoherency with predefined non-cacheable sections when crossing security states. Add new Arm KB reference. |
When using any type of caches in a system, coherency must be considered. A cache may contain data that is different from the backing memory (e.g. SRAM, Flash, etc.), or contain data that is different from another cache, which would make the cache incoherent. Coherency can be maintained through hardware and/or software support. For Cortex-M devices like the RA8, coherency can only be achieved through manual software management, and no automatic hardware coherency support exists.
In the default configuration for RA8 devices, FSP always enables the Code Flash Cache (FCACHE) and CM85 Instruction Cache (I-Cache) and handles the coherency of these caches where it is required. FSP does not handle FCACHE or I-Cache coherency outside of FSP. FSP optionally allows the CM85 Data Cache (D-Cache) to be enabled in the BSP configuration settings, where it is disabled by default. If the D-Cache is enabled, additional coherency concerns will arise. FSP does not currently handle any D-Cache coherency. This is a work in-progress. Drivers that will require D-Cache coherency support do not presently support it.
If D-Cache is enabled, the most common coherency concern is when data is shared between the CPU and another bus master. The D-Cache and backing memory for the shared location (usually SRAM) can become incoherent. To properly manage coherency in this situation, do one of the following:
Place the shared data in a non-cacheable region defined by the MPU or hardware.
Using a non-cacheable region means there will be no cache maintenance required and no coherency issues because the shared data will not be cached. A non-cacheable region can be defined in the MPU to contain the shared data. The non-cacheable region MUST be aligned to 32 bytes and be a length multiple of 32 bytes. This is required to meet MPU alignment and length requirements. FSP predefines the .nocache
and .nocache_sdram
uninitialized regions for this purpose, where data can be placed (see the BSP Usage Notes reference material). FSP configures and enables the MPU with these predefined regions at startup, if they have a non-zero size. Any address outside these regions will use the cacheability attributes defined by the default system address map. As an alternative, DTCM is always hardcoded as non-cacheable by the hardware and could contain the shared data. Using the MPU should be preferred, since DTCM can only be accessed through S-AHB by other bus masters and the access may contend with CPU access to DTCM.
Place the shared data in a cache line aligned and padded cacheable region, and use the CMSIS cache maintenance functions.
Using a cacheable region requires that the CMSIS cache maintenance functions be used to solve the coherency issues. The data MUST be aligned to 32 bytes and be a length multiple of 32 bytes. This is required to meet D-Cache line alignment and length requirements. If data is not aligned and fit exactly to D-Cache lines, you will create rare and difficult bugs! Use the CMSIS cache maintenance functions as required to manage the shared data coherency. When the D-Cache is enabled, FSP configures and enables the MPU at startup. Any address outside the predefined MPU regions will use the cacheability attributes defined by the default system address map. Cacheable shared data cannot be pooled into a single aligned and padded region, like the non-cacheable region. Cacheable shared data must be aligned and padded to its own data cache lines.
Less common coherency concerns include:
These less common situations require careful consideration for cache maintenance. FSP handles some of these less common situations for FCACHE and I-Cache where it must, but it is not possible to cover all user behavior that may occur.
Other special cache concerns that are not specifically coherency related include:
The CM85 optionally implements an I-Cache and/or D-Cache, with several configurable properties. Renesas RA8 devices have an implementation as follows:
The I-Cache and D-Cache are implemented inside of the CM85 by Arm. The CM85 has a Harvard design, where instruction fetches and data reads/writes are performed on separate interfaces. The I-Cache can only perform lookup and allocation for instruction fetches. The D-Cache can only perform lookup and allocation for data reads and writes.
Whether lookup of an address occurs in a cache depends on:
Whether allocation of an address occurs in a cache depends on:
The MSCR register controls whether lookups may occur for a cache, while the CCR register (S or NS) controls whether allocation may occur for a cache.
The system address map and Arm MPU (S or NS) define the memory type, shareability attributes, and cacheability attributes for an address. All three memory properties combine to define whether an address may lookup and allocate within one of the caches. The CM85 defines specific behaviors for some architecturally implementation defined or undefined behaviors regarding combinations of these three properties.
Both caches accept Inner cacheability attributes from the system address map and the Arm MPU (S or NS). The support of cacheability attributes varies depending on the cache and its configuration.
The I-Cache and D-Cache have different associativity, supported cache policies, supported memory attributes, and supported shareability attributes. Because of these variations, the behavior of I-Cache and D-Cache will be different even when accessing the same address.
Even when the Arm MPU (S or NS) is disabled, the default system address map will provide the memory type, shareability attributes, and cacheability attributes for an address. If the Arm MPU (S or NS) is enabled with no regions defined, it may be configured to use the default system address map as a background region.
Arm prescribes specific procedures for enabling and disabling the caches, and other cache maintenance operations that must be followed. There are I-Cache and D-Cache specific Arm architectural instructions that are used to perform these procedures. The Arm CMSIS library provides functions to perform these cache operations.
The FCACHE is implemented by Renesas and performs instruction prefetches, caches instruction fetches, and caches data reads from the CPU and other bus masters to Code Flash memory.
Cache maintenance for FCACHE is conducted through its peripheral registers.
In general, whether for the CM85 I-Cache or D-Cache, or Renesas FCACHE, cache maintenance is used to synchronize a given cache with the backing memory, and to synchronize caches with each other.
The correct maintenance sequence must be followed to avoid caches reading stale data from each other or from backing memory.
Unlike the D-Cache, the I-Cache is a read-only interface which cannot be written with new instructions by the CPU. The only way for the CPU to see modified instructions while using I-Cache is through invalidation. Thus if instructions change, I-Cache maintenance is always required whether FCACHE maintenance is needed or not.
The D-Cache is a read and write interface. The CPU will write modified data into the D-Cache (if cacheable, and other properties are met), so any read back from the D-Cache will have the latest data. Thus if data changes, it may be necessary to perform D-Cache maintenance and possibly FCACHE maintenance depending on whether the data is shared between the CPU and another bus master or if the data is changed by FACI.
The word "shared" here does not mean "Shareability" as defined as an Arm memory attribute.
I-Cache and D-Cache cache lines are aligned to 32 bytes and are 32 bytes in length each. For D-Cache specifically, where a cache line may become dirty when write-back is used, cacheable shared data written by a bus master cannot be allowed to mix on a write-back cache line with data that is unrelated. For simplicity, follow the most conservative rule of aligning and padding cacheable shared data to meet D-Cache line requirements.
CM85 has an erratum with write-back when D-Cache is enabled. FSP v5.3.0 has added the recommended workaround of using MSCR.FORCEWT to force all D-Cache access to write-through, even if an access specifies write-back. Developers should write software as if write-back is being used for full compatability with data cache, which includes the above alignment and padding requirement.
This guide describes common maintenance scenarios including write-back and write-through. The write-through write policy does not obviate the need for using memory barriers on the CM85. The use of memory barriers is out-of-scope for this document.
FSP handles some of these maintenance scenarios during startup and in the Flash HP driver.
Write Back
Area must be aligned and padded to D-Cache line requirements.
Write Back
Area must be aligned and padded to D-Cache line requirements.
Write Back
Area must be aligned and padded to D-Cache line requirements if shared with a bus master.
Write Back
Area must be aligned and padded to D-Cache line requirements as we assume it is shared with CPU here.
Macro | Purpose | Notes |
---|---|---|
BSP_CFG_DCACHE_ENABLED | Defaults to zero (disabled). If defined and non-zero, the FSP startup code in system.c will configure several predefined non-cacheable sections in the MPU if they are of non-zero size, enable the MPU, and enable the D-Cache. | This is normally configured in e2 Studio under the BSP->Cache settings->Data cache properties panel for the project. |
BSP_CFG_ROM_REG_OFS1_INITECCEN | Defaults to zero (disabled). Sets the value of OFS1.INITECCEN for BSP_CFG_ROM_REG_OFS1 , which controls whether ECC is enabled for caches and TCM. | This is normally configured in e2 Studio under the BSP->OFS1 register settings->Tightly Coupled Memory (TCM)/Cache ECC properties panel for the project. |
Function | Purpose | Notes |
---|---|---|
SCB_EnableICache | If I-Cache allocations are not already enabled, invalidate the entire I-Cache then enable I-Cache allocations with CCR.IC . | Will do nothing if I-Cache allocations are already enabled. FSP automatically enables the I-Cache at startup by directly setting CCR.IC instead of using this function. |
SCB_DisableICache | Disable I-Cache allocations with CCR.IC , then invalidate the entire I-Cache. | |
SCB_InvalidateICache | Invalidate the entire I-Cache. | This is safe to use at any time, because cache lines in the I-Cache can never be dirty. Used after modifying instructions anywhere in memory (e.g. Flash, RAM). FSP calls this function after initializing the predefined RAM code section during startup, and when exiting Code Flash program or erase mode in the Flash HP driver. |
SCB_InvalidateICache_by_Addr | Loop to invalidate the instructions in the I-Cache, starting at a particular address and extending for the specified length in bytes. | This is safe to use at any time, because cache lines in the I-Cache can never be dirty. Can be used to more efficiently invalidate instructions at specific addresses. Will invalidate in increments of cache lines (32 bytes). |
These functions can be safely interrupted and do not need to be guarded by critical sections. However, depending on the structure of the application logic, guarding the functions may be necessary. This must be analyzed for an individual scenario. See the CMSIS 6 API reference material for further information.
Function | Purpose | Notes |
---|---|---|
SCB_EnableDCache | If D-Cache allocations are not already enabled, loop to invalidate the entire D-Cache then enable D-Cache allocations with CCR.DC . | Will do nothing if D-Cache allocations are already enabled. FSP automatically calls this function at startup if BSP_CFG_DCACHE_ENABLED is defined and non-zero. |
SCB_DisableDCache | Disable D-Cache allocations with CCR.DC , then loop to clean and invalidate the entire D-Cache. | |
SCB_InvalidateDCache | Loop to invalidate the entire D-Cache. | This function should generally not be used, since no use case typically exists to invalidate the entire D-Cache. |
SCB_CleanDCache | Loop to clean the entire D-Cache. | |
SCB_CleanInvalidateDCache | Loop to clean and invalidate the entire D-Cache. | |
SCB_InvalidateDCache_by_Addr | Loop to invalidate the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. | Will invalidate in increments of cache lines (32 bytes). |
SCB_CleanDCache_by_Addr | Loop to clean the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. | Will clean in increments of cache lines (32 bytes). |
SCB_CleanInvalidateDCache_by_Addr | Loop to clean and invalidate the data in the D-Cache, starting at a particular address and extending for the specified length in bytes. | Will clean and invalidate in increments of cache lines (32 bytes). |
These functions can be safely interrupted and do not need to be guarded by critical sections. However, depending on the structure of the application logic, guarding the functions may be necessary. This must be analyzed for an individual scenario. See the CMSIS 6 API reference material for further information.
Because of the simplicity of the I-Cache and FCACHE relative to the D-Cache and the critical instruction execution performance enhancement that they provide, FSP always enables the I-Cache and the FCACHE. This is not configurable.
FSP automatically enables the I-Cache at startup for CM85 by directly setting CCR.IC
. This method is used instead of the CMSIS 6 function, so that the I-Cache, branch prediction, and the low-overhead branch (LOB) extension may simultaneously be enabled. The automatic hardware cache invalidation of the CM85 ensures that cache lookups, allocations, and cache maintenance are no-op until the invalidation is finished, so immediately enabling the I-Cache is safe to do.
FSP invalidates the I-Cache using the CMSIS 6 functions:
The FCACHE is a Renesas cache, not an Arm cache, and it is controlled through its separate peripheral registers.
If instructions have changed outside of the control of FSP, it is user responsibility to perform I-Cache maintenance. This means instructions stored in any cacheable location, including internal flash, internal RAM, external flash, external RAM, etc. It is recommended to use the CMSIS 6 functions to perform I-Cache maintenance.
Users must also consider the interactions that D-Cache has with instruction modifications. For example, if the modified instructions are written to a cacheable location while D-Cache is enabled (e.g. RAM), those data writes may be cached. The D-Cache will need to be cleaned to guarantee that all data writes have been written back to guarantee their visibility to the I-Cache. D-Cache maintenance will also be needed if instructions change in Code or Data Flash via FACI and are cacheable, since D-Cache will cache instructions as data.
There is no hardware mechanism between the I-Cache and D-Cache in which they automatically share coherency, so coherency must be manually maintained by software as required.
The D-Cache is a cache with more complex interactions than the I-Cache. Thus, FSP leaves the D-Cache disabled by default on RA8 projects. It can be enabled in e2 Studio under the BSP->Cache settings->Data cache
properties panel for the project.
Presently, FSP does not support any D-Cache functionality except:
No FSP drivers are currently compatible with D-Cache enablement. This compatibility is a work in-progress.
Presently, D-Cache usage is fully in the realm of user responsibility. The user must perform all D-Cache maintenance as required, or must store data accordingly in the predefined non-cacheable regions or otherwise.
Coherency must be considered for these bus masters:
Coherency must be considered for interactions with:
The .nocache
and .nocache_sdram
sections are predefined for GCC, LLVM, and IAR compilers. These same sections exist for AC6 as .bss.nocache
and .bss.nocache_sdram
because of special naming restrictions with AC6 and uninitialized sections. These sections are uninitialized for all compilers, despite AC6 requiring a prefix of .bss
.
Anything placed within them will be non-cacheable. Instruction fetches and data reads or writes to these sections will never lookup or allocate in their respective caches.
The FSP startup code configures these sections as non-cacheable using the MPU during startup, if the D-Cache is enabled via the BSP configuration. Otherwise, they are not configured by FSP in the MPU if the D-Cache is disabled. The predefined sections are aligned to 32 bytes and are padded to a minimum of 32 bytes in length. This meets both MPU region alignment and length requirements, and cache line alignment and length requirements. The MPU and cache line alignment and length requirements protect against inadvertent mixing of cacheable and non-cacheable data.
If the Secure and/or Non-secure MPU configuration is changed, and the cacheability of an address changes, cache maintenance is required to synchronize the caches with the new memory attributes. If this is not done, a newly non-cacheable address may be left in the cache, and behavior when accessing the address is considered undefined.
If the SAU configuration is changed, and the security attributes of an address changes, cache maintenance is required to synchronize the caches with the new security attributes. If this is not done, cached data will be desynchronized with the new security attributes and may result in undefined behavior.
The Armv8-M Architecture Reference Manual specifies a default system address map that defines the memory regions of the architecture and their various properties.
These properties include the memory type, shareability attributes, and cacheability attributes for the regions. When the MPU is disabled, this default system address map provides the system with default attributes for instruction fetches, data reads, and data writes to and from addresses. When the MPU is enabled, it can be used to override the default system address map entirely, or both may be used together by setting the MPU_CTRL.PRIVDEFENA bit. This bit allows instruction fetches or data reads and writes that do not correspond to a configured MPU region to hit the default system address map as a background region instead, so long as the access is Privileged. FSP does not support Unprivileged execution, so it always assumes Privileged execution state. Allowing the default system address map as a background region is the method that FSP uses to provide the predefined no-cache sections, by configuring the MPU for the no-cache sections while allowing all other memory accesses to rely on the default system address map. Configuring an MPU region involves specifying a 32 byte aligned start address and an inclusive ending address, and also specifying the various memory attributes of the region. The MPU region beginning address register will mask downward to align to a 32 byte boundary. i.e (address & ~0x1F) The MPU region ending address register will OR upward with 0x1F for the inclusive ending boundary. i.e. (address | 0x1F) Thus, the minimum size of an MPU region is 32 bytes and the size may only increase in 32 byte increments.
See the Armv8-M Memory Model and Memory Protection User Guide in the references section for a high-level introduction, and the Armv8-M Architecture Reference Manual for details.
The CM85 may, with no deliberate software instruction, speculatively fetch instructions or read data from any memory location. Upon doing so, the instruction fetch or data read may enter the respective cache. The purpose of this speculative behavior is to predict the next instructions or data to be fetched, read, or written, which increases performance if the prediction is correct. This may cause instructions or data to unexpectedly appear in cache, so speculation must be considered when solving for cache coherency.
At any time, the I-Cache or D-Cache may evict cache lines. For I-Cache, this means invalidation of the evicted line. For D-Cache, this means cleaning and invalidation of the evicted line, where cleaning occurs if the cache line is dirty. D-Cache eviction of dirty lines may cause data to be unexpectedly written out to backing memory when write-back is used, and this must be considered when solving for cache coherency. For D-Cache aligned and padded buffers derived from areas like the stack or heap, one or more of the associated cache lines may already be dirty and require cleaning and/or invalidation before being used by a bus master.
The CM85 will provide a SRAM buffer to the DMAC, the DMAC will write to the buffer, and the CM85 will read from the written buffer. I-Cache, D-Cache, and FCACHE are enabled. SRAM exists in the same "SRAM" region defined by the default system address map in the Armv8-M Architecture Reference Manual. SRAM is Normal memory, write-back, write-allocate, read-allocate, and non-shareable by the default system address map attributes.
The correct way to solve coherency in this situation using the two recommended solution options is:
.nocache
section that meets these requirements.The second solution is shown in the D-Cache Enabled scenarios here where write-back is used and a bus master performs writing.
By default, FSP disables ECC for the caches and TCM with OFS1.INITECCEN. For best performance, it is recommended to keep ECC for cache and TCM disabled. If enabling is desired, please consult the reference material to understand the consequences of enabling ECC for cache and TCM, which are too numerous to describe here. The automatic hardware cache invalidation performed by the CM85 is compatible with ECC.
For D-Cache, Shareable or Non-Shareable also affects whether an address is Cacheable or Non-Cacheable. A Shareable address is forced to Non-Cacheable for D-Cache. I-Cache is not influenced by the Shareability properties and will always follow the MPU cacheability attributes. The Transient attribute is of limited utility and can mostly be ignored. Clean cache lines that are marked Transient are preferred for eviction before clean cache lines marked Non-Transient. Dirty cache lines whether marked Transient or Non-Transient are evicted with the same priority.
See the CM85 Technical Reference Manual reference material for further information.
x = [I, D]
CCR.xC (S or NS) | MSCR.xCACTIVE | Behavior |
---|---|---|
1 | 1 | Allocate, Lookup |
0 | 1 | No Allocate, Lookup (Reset Behavior) |
X | 0 | No Allocate, No Lookup |
This behavior is applicable to Cortex-M55 and Cortex-M85. If you have previous experience with a Cortex-M7 device, this cache behavior is different since MSCR.xCACTIVE bits were introduced for CM55 and CM85. No CM7 or CM55 core is offered by any current RA devices. The new addition of the MSCR.xCACTIVE bits allow for cache power control, and by allowing a third cache behavioral state of lookups without allocation, cleaning the D-Cache after disabling it becomes less error prone since dirty cache lines cannot be made stale before being cleaned, by writes occurring after D-Cache is disabled like on CM7. The MSCR.xCACTIVE bits have a reset value of 1, so the caches are powered by default and lookups are possible. Until the automatic hardware cache invalidation which begins after reset finishes, lookups and allocations do not occur even if CCR.xC is set, and cache maintenance operations are no-op. The MSCR.xCACTIVE bits should generally never be cleared to 0.
Consult the latest Renesas Technical Updates (TU) and Arm Cortex-M85 Errata documents.
These are example errata to demonstrate the possibility of issues with cache usage at the time of this writing.
Currently available RA8D1, RA8M1, and RA8T1 devices use the r0p2 variant of the core, so they are affected by these errata.
Erratum 2682779 should not require a workaround, since I-Cache will never be powered off in most circumstances.
FSP added workarounds for 3175626 and 3190818 in v5.3.0.
Generally, consult these categories of documents for the most recent and further information than this overview may provide.