Table of Contents

Direct Memory Access (DMA)

The Direct Memory Access (DMA) engine in the Blackfin processor allows automated data transfers with minimal overhead for the core. DMA transfers can occur between any of the DMA capable peripherals (such as the SPORT or PPI) and the external SDRAM.

DMA Systems

There are two aspects of the DMA subsystem to consider.

These two will be covered in different sections.

Linux DMA

This is closely related to the needs for DMA processes on mainly x86 systems but also provides a base set of DMA functions that apply to non x86 systems.

Most of the x86 complexity is a result from dealing with limited hardware capabilities. With a Memory Management Unit and Data Caches, you have the problem of trying to provide a memory area that is valid for both the DMA processor and the CPU, as well as having to make sure that when data arrives in the external memory, the CPU will not be looking at invalid data caches.

This sort of memory mapping is often referred to as being consistent (sometimes coherent). In some architectures, a coherent mapping implies some clever arrangement where the data is always valid even if it is being cached. It is possible, with a memory management unit, to provide two (or more) virtual addresses for the same physical memory area. While one virtual address may access the data cache and not the physical memory location, another virtual address may reference the same physical memory location but bypass the data cache.

Another complication in x86 systems is that some older pieces of hardware are unable to access physical memory areas greater than 16M (typically devices that live on the ISA bus). This forces the Linux memory allocator to have to allow a special memory option that restricts physical memory allocations to only the lower 16M of memory (GFP DMA).

Since this restriction was also passed on to hardware devices used in other architectures, the memory allocation schemes in these architectures had to have similar options.

The Microcontroller and DSP fields are often free of such burdens, but, since the aim of the kernel driver programmer is to be architecture independent, you have to know about these “features”.

Ok Whats a Bus address ?

When the CPU (say with the MMU turned off) wants to access Physical memory, it puts that address on its output pins. This a Physical Address.

When a peripheral device wants to access the same physical memory (as in a DMA function) it may have to use a different address to get to the same physical location. This is a Bus Address.

So a Bus Address is the address used by a peripheral to access a certain Physical Address.

Dynamic DMA Mapping

DMA operations, in the end, come down to allocating a buffer and passing bus addresses to your device.

A DMA mapping is a combination of allocating a DMA buffer and generating an address for that buffer that is accessible by the device.

DMA mappings must also address the issue of cache coherency. Remember that modern processors keep copies of recently accessed memory areas in a fast, local cache; without this cache, reasonable performance is not possible. If your device changes an area of main memory, it is imperative that any processor caches covering that area be invalidated; otherwise the processor may work with an incorrect image of main memory, and data corruption results. Similarly, when your device uses DMA to read data from main memory, any changes to that memory residing in processor caches must be flushed out first.

On Blackfin, there is not virtual address, also bus address and physical address are in the same memory space. DMA mapping is implemented in a simple way: A block of SDRAM is reserved for DMA usage, which is configured as un-cacheable. dma_alloc_coherent() allocates DMA buffer from this memory region.

In some case, if the DMA buffer is not allocated as “coherent”, the device driver needs to handle cache coherence itself. Before starting a DMA transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA transfer to memory, be sure to invalidate the cache over the destination address range for the same reason.

file: drivers/spi/spi_bfin5xx.c

scm failed with exit code 1:
file does not exist in git

If you are writing a portable device driver, be sure to used the generic DMA APIs (full list please refer to the document):

Blackfin DMA

The Blackfin processor offers a wide array of DMA capabilities.

Flow Types and Descriptor

There are 5 different ways the DMA controller can be set up. These as called Flow types

The flow type can be defined in a CONFIG word in a descriptor so the modes can be mixed and the operation quite complex.

Descriptors are used to control the DMA channel and allow a complex stream of data packet to be assembled if required.

So in the array case all the descriptors follow each other in memory. The CURR_DESC_PTR register must be set up and the DMA enabled.

The only difference between th Small and Large Descriptor modes is the restriction that all the descriptors must reside in the same 64K memory area defined by the High Memory word. In these cases the NEXT_DESC_PTR is set up to point to the first descriptor in the array.

One other slight complexity in the descriptor business is the fact the DMA controller does not have to read ALL of the words in the descriptor array. The NDSIZE part of the CONFIG Register contains the number of elements to read into the DMA controller for this operation.

Descriptor Memory Layout

Large/Small Descriptor:

file: arch/blackfin/include/asm/dma.h

scm failed with exit code 1:
file does not exist in git

Example of Array Descriptor:

file: drivers/mmc/host/bfin_sdh.c

scm failed with exit code 1:
file does not exist in git

2-D DMA

2-D DMA can be roughly viewed as:

/* Correct me if the boundary check is wrong */
for ( ; Y_COUNT > 1; Y_COUNT--)
{
	for ( ; X_COUNT > 1; X_COUNT--)
		DMAx_CURR_ADDR += X_MODIFY;
	DMAx_CURR_ADDR += Y_MODIFY;
}

In some video application, 2-D DMA is more convenient to use than 1-D DMA:

file: drivers/video/bf537-lq035.c

scm failed with exit code 1:
file does not exist in git

Also in some case, if the buffer is too large and exceed the 16-bit X_COUNT (64K) limits, 2-D DMA can be used:

file: drivers/char/bfin_sport.c

scm failed with exit code 1:
file does not exist in git

MDMA

Beside setting up MDMA by yourself, there exists two ways to use MDMA:

DMA Pitfalls

When customizing your DMA driver, there are a few pitfalls to be aware of

Invalidating Cache vs Flushing Cache

Flushing cache and invalidating cache sound similar, but they are very different animals. Flushing cache writes the current contents of the cache back to memory, whereas invalidating cache marks cache entries as invalid (thereby discarding the cached entries). Before starting a DMA transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA transfer to memory, be sure to invalidate the cache over the destination address range for the same reason. Otherwise very subtle bugs can be introduced to your driver.

DMA Holes

Do not work too close to DMAx_CURR_ADDR, DMAx_CURR_DESC_PTR, or DMAx_CURR_X_COUNT/DMAx_CURR_Y_COUNT

There is a pipeline in the DMA transfer of approximately 10 data elements. So if you are polling any of the DMA current pointer registers to try and reduce latency, subtract off an offset of approximately 10 elements from the pointer that is returned from the register. Otherwise, when you go to examine the data, you will find there will be missing data where the DMA pointer seemingly incremented but no data was written. To see if this problem is occuring with you, write a known pattern to your DMA destination buffer right before enabling the transfer. After the transfer is complete, look through the buffer to see if your known pattern still resides in the buffer. If it does, then this problem is occuring to you. (See pg 5-57 of the BF537 HW reference manual for more info)

DMA API

Please refer to: arch/blackfin/include/asm/dma.h, and arch/blackfin/kernel/bin_dma_5xx.c.

The DMA API has been extended to allow for this increased flexibility. These APIs are blackfin specific:

Simple DMA Example

This is a simple DMA example taken from the adsp-spidac.c driver. This is getting 8 bit data from the SPI device int mybuffer.

int mydmatest(void)
{
     char mybuffer[1024 * 32];
     int mysize = 1024 * 32;
     int ret;
 
     // lets ask for the DMA channel
     ret = request_dma(CH_SPI,"BF533 SPI Test");
     if ( ret < 0 ) {
        printk(" Unable to get DMA channel\n");
        return 1;
     } 
 
    // turn off the DMA channel
    disable_dma(CH_SPI);
 
    // set up the dma config
 
    // WNR We are going to write to memory
    // RESTART throw away any old data in the fifo
    // Enable Interrupts
    set_dma_config(CH_SPI, ( WNR | RESTART | DI_EN ));
 
    // set address to drop data into
    set_dma_start_address(CH_SPI, (unsigned long)&mybuffer);
 
    // set the transfer size in bytes
    set_dma_x_count(CH_SPI,size); 
 
    // set the X modify ( dont worry about Y for this one )
    set_dma_x_modify(CH_SPI,1); 
 
    // sync the cores up
    __builtin_bfin_ssync();
 
   // off we go
   enable_dma(CH_SPI); 

Since this is going to use interrupts then the interrupt routine must be associated with the DMA channel.

    // set the IRQ callback
    set_dma_callback(CH_SPI, myirq, mydata); 

The IRQ routine could look like this. It simply clears the irq status.

static irqreturn_t myirq( int irq, void *data)
{
   unsigend short mystat;
 
   mystat = get_dma_curr_irqstat(CH_SPI);
   clear_dma_irqstat(CH_SPI);
 
   wake_up_interruptible(&mywaiting_task);
   return IRQ_HANDLED;
}