world leader in high performance signal processing
Trace: » dma_systems

Direct Memory Access (DMA)

The Direct Memory Access (DMA) engine in the Blackfin processor allows automated data transfers with minimal overhead for the core. DMA transfers can occur between any of the DMA capable peripherals (such as the SPORT or PPI) and the external SDRAM.

DMA Systems

There are two aspects of the DMA subsystem to consider.

  • The generic Linux DMA framework
  • The extensions for the Blackfin processor

These two will be covered in different sections.

Linux DMA

This is closely related to the needs for DMA processes on mainly x86 systems but also provides a base set of DMA functions that apply to non x86 systems.

Most of the x86 complexity is a result from dealing with limited hardware capabilities. With a Memory Management Unit and Data Caches, you have the problem of trying to provide a memory area that is valid for both the DMA processor and the CPU, as well as having to make sure that when data arrives in the external memory, the CPU will not be looking at invalid data caches.

This sort of memory mapping is often referred to as being consistent (sometimes coherent). In some architectures, a coherent mapping implies some clever arrangement where the data is always valid even if it is being cached. It is possible, with a memory management unit, to provide two (or more) virtual addresses for the same physical memory area. While one virtual address may access the data cache and not the physical memory location, another virtual address may reference the same physical memory location but bypass the data cache.

Another complication in x86 systems is that some older pieces of hardware are unable to access physical memory areas greater than 16M (typically devices that live on the ISA bus). This forces the Linux memory allocator to have to allow a special memory option that restricts physical memory allocations to only the lower 16M of memory (GFP DMA).

Since this restriction was also passed on to hardware devices used in other architectures, the memory allocation schemes in these architectures had to have similar options.

The Microcontroller and DSP fields are often free of such burdens, but, since the aim of the kernel driver programmer is to be architecture independent, you have to know about these “features”.

Ok Whats a Bus address ?

When the CPU (say with the MMU turned off) wants to access Physical memory, it puts that address on its output pins. This a Physical Address.

When a peripheral device wants to access the same physical memory (as in a DMA function) it may have to use a different address to get to the same physical location. This is a Bus Address.

So a Bus Address is the address used by a peripheral to access a certain Physical Address.

Dynamic DMA Mapping

  • Document: linux-kernel/Documentation/DMA-API.txt, Linux Device Driver (3rd) - chapter 15.
  • API definition: linux-kernel/include/linux/dma-mapping.h, linux-kernel/arch/blackfin/include/asm/dma-mapping.h, linux-kernel/arch/blackfin/kernel/bfin_dma_5xx.c

DMA operations, in the end, come down to allocating a buffer and passing bus addresses to your device.

A DMA mapping is a combination of allocating a DMA buffer and generating an address for that buffer that is accessible by the device.

DMA mappings must also address the issue of cache coherency. Remember that modern processors keep copies of recently accessed memory areas in a fast, local cache; without this cache, reasonable performance is not possible. If your device changes an area of main memory, it is imperative that any processor caches covering that area be invalidated; otherwise the processor may work with an incorrect image of main memory, and data corruption results. Similarly, when your device uses DMA to read data from main memory, any changes to that memory residing in processor caches must be flushed out first.

On Blackfin, there is not virtual address, also bus address and physical address are in the same memory space. DMA mapping is implemented in a simple way: A block of SDRAM is reserved for DMA usage, which is configured as un-cacheable. dma_alloc_coherent() allocates DMA buffer from this memory region.

In some case, if the DMA buffer is not allocated as “coherent”, the device driver needs to handle cache coherence itself. Before starting a DMA transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA transfer to memory, be sure to invalidate the cache over the destination address range for the same reason.

file: drivers/spi/spi_bfin5xx.c

scm failed with exit code 1:
file does not exist in git

If you are writing a portable device driver, be sure to used the generic DMA APIs (full list please refer to the document):

  • void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp);
  • void dma_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle);
  • dma_addr_t dma_map_single(struct device *dev, void *ptr, size_t size, enum dma_data_direction dir)
  • dma_addr_t dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir)
  • int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir);

Blackfin DMA

The Blackfin processor offers a wide array of DMA capabilities.

  • 12 Different DMA channels
  • Memory to Memory and IO to Memory Channels transfers
  • Dual X and Y indexing Address counters
  • Simple Register DMA control
  • Optional Sophisticated Descriptor Based Control
  • 8, 16, or 32 bit data size
  • Interrupt on each DMA packet completion
  • Flexible DMA Priority

Flow Types and Descriptor

There are 5 different ways the DMA controller can be set up. These as called Flow types

  • FLOW_STOP - Stop after the current job
  • FLOW_AUTO - Autobuffer, Repeat the current transfer until stopped
  • FLOW_ARRAY - Use a sequential list of descriptors
  • FLOW_SMALL - Use a linked list of small descriptors
  • FLOW_LARGE - Use a linked list of large descriptors

The flow type can be defined in a CONFIG word in a descriptor so the modes can be mixed and the operation quite complex.

Descriptors are used to control the DMA channel and allow a complex stream of data packet to be assembled if required.

  • Array - Simple Sequential array of descriptors in memory
  • Small Descriptor - the High address word does not change just the Low addr word is in the memory array
  • Large Descriptor Both High and Low address words are in the memory array.

So in the array case all the descriptors follow each other in memory. The CURR_DESC_PTR register must be set up and the DMA enabled.

The only difference between th Small and Large Descriptor modes is the restriction that all the descriptors must reside in the same 64K memory area defined by the High Memory word. In these cases the NEXT_DESC_PTR is set up to point to the first descriptor in the array.

One other slight complexity in the descriptor business is the fact the DMA controller does not have to read ALL of the words in the descriptor array. The NDSIZE part of the CONFIG Register contains the number of elements to read into the DMA controller for this operation.

Descriptor Memory Layout

Large/Small Descriptor:

file: arch/blackfin/include/asm/dma.h

scm failed with exit code 1:
file does not exist in git

Example of Array Descriptor:

file: drivers/mmc/host/bfin_sdh.c

scm failed with exit code 1:
file does not exist in git


2-D DMA can be roughly viewed as:

/* Correct me if the boundary check is wrong */
for ( ; Y_COUNT > 1; Y_COUNT--)
	for ( ; X_COUNT > 1; X_COUNT--)

In some video application, 2-D DMA is more convenient to use than 1-D DMA:

file: drivers/video/bf537-lq035.c

scm failed with exit code 1:
file does not exist in git

Also in some case, if the buffer is too large and exceed the 16-bit X_COUNT (64K) limits, 2-D DMA can be used:

file: drivers/char/bfin_sport.c

scm failed with exit code 1:
file does not exist in git


Beside setting up MDMA by yourself, there exists two ways to use MDMA:

  • void *dma_memcpy(void *pdst, const void *psrc, size_t size) (refer to arch/blackfin/kernel/bfin_dma_5xx.c)
  • Blackfin DMA driver, see bfin-dma

DMA Pitfalls

When customizing your DMA driver, there are a few pitfalls to be aware of

Invalidating Cache vs Flushing Cache

Flushing cache and invalidating cache sound similar, but they are very different animals. Flushing cache writes the current contents of the cache back to memory, whereas invalidating cache marks cache entries as invalid (thereby discarding the cached entries). Before starting a DMA transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA transfer to memory, be sure to invalidate the cache over the destination address range for the same reason. Otherwise very subtle bugs can be introduced to your driver.

DMA Holes


There is a pipeline in the DMA transfer of approximately 10 data elements. So if you are polling any of the DMA current pointer registers to try and reduce latency, subtract off an offset of approximately 10 elements from the pointer that is returned from the register. Otherwise, when you go to examine the data, you will find there will be missing data where the DMA pointer seemingly incremented but no data was written. To see if this problem is occuring with you, write a known pattern to your DMA destination buffer right before enabling the transfer. After the transfer is complete, look through the buffer to see if your known pattern still resides in the buffer. If it does, then this problem is occuring to you. (See pg 5-57 of the BF537 HW reference manual for more info)


Please refer to: arch/blackfin/include/asm/dma.h, and arch/blackfin/kernel/bin_dma_5xx.c.

  • int request_dma(unsigned int channel, const char *device_id)
  • void free_dma(unsigned int channel)
  • void enable_dma(int channel)
  • void disable_dma(int channel)

The DMA API has been extended to allow for this increased flexibility. These APIs are blackfin specific:

  • void set_dma_start_addr(unsigned int channel, unsigned long addr)
  • void set_dma_next_desc_addr(unsigned int channel, unsigned long addr)
  • void set_dma_x_count(unsigned int channel, unsigned short x_count)
  • void set_dma_x_modify(unsigned int channel, short x_modify)
  • void set_dma_y_count(unsigned int channel, unsigned short y_count)
  • void set_dma_y_modify(unsigned int channel, short y_modify)
  • void set_dma_config(unsigned int channel, unsigned short config)
  • unsigned short set_bfin_dma_config(char direction, char flow_mode, char intr_mode, char dma_mode, char width)
  • unsigned short get_dma_curr_irqstat(unsigned int channel)
  • unsigned short get_dma_curr_xcount(unsigned int channel)
  • unsigned short get_dma_curr_ycount(unsigned int channel)
  • void set_dma_sg(unsigned int channel, struct dmasg_t *sg, int nr_sg)
  • void dma_disable_irq(unsigned int channel)
  • void dma_enable_irq(unsigned int channel)
  • void clear_dma_irqstat(unsigned int channel)
  • int set_dma_callback(unsigned int channel, dma_interrupt_t callback, void *data)

Simple DMA Example

This is a simple DMA example taken from the adsp-spidac.c driver. This is getting 8 bit data from the SPI device int mybuffer.

int mydmatest(void)
     char mybuffer[1024 * 32];
     int mysize = 1024 * 32;
     int ret;
     // lets ask for the DMA channel
     ret = request_dma(CH_SPI,"BF533 SPI Test");
     if ( ret < 0 ) {
        printk(" Unable to get DMA channel\n");
        return 1;
    // turn off the DMA channel
    // set up the dma config
    // WNR We are going to write to memory
    // RESTART throw away any old data in the fifo
    // Enable Interrupts
    set_dma_config(CH_SPI, ( WNR | RESTART | DI_EN ));
    // set address to drop data into
    set_dma_start_address(CH_SPI, (unsigned long)&mybuffer);
    // set the transfer size in bytes
    // set the X modify ( dont worry about Y for this one )
    // sync the cores up
   // off we go

Since this is going to use interrupts then the interrupt routine must be associated with the DMA channel.

    // set the IRQ callback
    set_dma_callback(CH_SPI, myirq, mydata); 

The IRQ routine could look like this. It simply clears the irq status.

static irqreturn_t myirq( int irq, void *data)
   unsigend short mystat;
   mystat = get_dma_curr_irqstat(CH_SPI);
   return IRQ_HANDLED;

Complete Table of Contents/Topics