world leader in high performance signal processing
Trace: » on-chip_sram

Using On-Chip SRAM Memory

Blackfin processors support a hierarchical memory model with different performance and size parameters, depending on the memory location within the hierarchy. Level 1 (L1) memories interconnect closely and efficient with the core for best performance. Separate blocks of L1 memory can be accessed simultaneously through multiple bus systems. Instruction memory is separated from data memory, but unlike classical Harvard architectures, all L1 memory blocks are accessed by one unified addressing scheme. Portions of L1 memory can be configured to function as cache memory. Some Blackfin derivatives also feature on-chip Level 2 (L2) memories. Based on a Von-Neumann architecture, L2 memories have a unified purpose and can freely store instructions and data. Although L2 memories still reside inside the CCLK clock domain, they take multiple CCLK cycles to access. The processors also provide support of an external memory space that includes asynchronous memory space for static RAM devices and synchronous memory space for dynamic RAM such as SDRAM devices. For details on the memory architecture, see the section on cache. For memory size, population, and off-chip memory interfaces, refer to the specific Blackfin Processor Hardware Reference manual for your derivative.

Using the L1 memory blocks are key to being able to effectively and efficient run the Blackfin. Simply turning cache on, will only use 1/2 of the available L1 SRAM in the system, and can run into cache pollution or cache thrashing. To prevent these issues, and to best use L1, some experimentation of placing things in L1 (not cached) memory must be done to see how that effects your criteria for overall system performance.

There are times that a 2x improvement in performance can be made when allocating and managing L1 SRAM Data banks A and B separately, since loads from both banks can occur simultaneously.

Using L1 in Kernel Space

GCC Attributes and ASM Sections

Portions of the kernel can be selectively placed into on-chip SRAM. This can be done on a function by function basis by using the GCC function __attribute__. The same applies to data on a variable by variable basis.

Examples of this can be seen in existing C and assembly files.

file: arch/blackfin/mach-common/irqpanic.c

scm failed with exit code 1:
file does not exist in git

file: arch/blackfin/mach-common/entry.S

scm failed with exit code 1:
file does not exist in git

The Blackfin GCC compiler recognize these attributes:

  • l1_text
  • l1_data (try data bank A first, then fall back to data bank B)
  • l1_data_A
  • l1_data_B
  • l2 (you must use the longcall attribute with this in the kernel)

You can use them like so:

int a __attribute__ ((l1_data));
int b __attribute__ ((l1_data_A));
int c __attribute__ ((l1_data_B));
int d __attribute__ ((l2));
void foo(int a) __attribute__ ((l1_text));
void moo(int a) __attribute__ ((l2, longcall));

It is also possible to use the section attribute directly, but this usage is NOT supported. It is documented here only for completeness.

int a __attribute__ ((section(""))) = 1;

Here various sections are used:

  • - Put data section (initialized data) into Data SRAM Bank A
  • - Put data section (initialized data) into Data SRAM Bank B
  • .l1.bss - Put bss section (uninitialized data) into Data SRAM Bank A
  • .l1.bss.B - Put bss section (uninitialized data) into Data SRAM Bank B
  • .l1.text - Put text section (code) into Instruction SRAM
  • .l2.text - Put text section (code) into L2 SRAM
  • - Put data section (initialized data) into L2 SRAM
  • .l2.bss - Put data section (uninitialized data) into L2 SRAM

Dynamically Allocating

There is also an API for kernel code to dynamically allocate and free L1 SRAM. These can be used from compiled in code or from kernel modules.

The function protoypes in question are:

file: arch/blackfin/include/asm/bfin-global.h

scm failed with exit code 1:
file does not exist in git

and can easily be included in your source with:

#include <asm/blackfin.h>

These functions behave just like your normal malloc/free functions.

/* include prototypes */
#include <asm/blackfin.h>
/* allocate 50 items in L1 Data A */
uint16_t *points = l1_data_A_sram_alloc(sizeof(*points) * 50);
if (!points)
	goto fail; /* No room in L1 */
/* free the allocation */

Kernel Modules in L1

Beside using __attribute__ directive in source code, kernel module can also be linked using ”--code-in-l1” and ”--data-in-l1” link options to put the whole text and data segments into L1. These options will set special flags in object header and the kernel module loader can decide whether or not to put text or data segment into L1 according to the flags.

A complete example is shown at how to load_driver_module_into_l1_memory.