Direct Memory Access (DMA) The Direct Memory Access (DMA (Direct Memory Access)) engine in the Blackfin processor allows automated data transfers with minimal overhead for the core. DMA (Direct Memory Access) transfers can occur between any of the DMA (Direct Memory Access) capable peripherals (such as the SPORT (synchronous high speed serial port) or PPI (Parallel Peripheral Interface)) and the external SDRAM (synchronous dynamic random access memory (system memory)). DMA Systems There are two aspects of the DMA (Direct Memory Access) subsystem to consider. The generic Linux DMA (Direct Memory Access) framework The extensions for the Blackfin processor These two will be covered in different sections. Linux DMA This is closely related to the needs for DMA (Direct Memory Access) processes on mainly x86 systems but also provides a base set of DMA (Direct Memory Access) functions that apply to non x86 systems. Most of the x86 complexity is a result from dealing with limited hardware capabilities. With a Memory Management Unit and Data Caches, you have the problem of trying to provide a memory area that is valid for both the DMA (Direct Memory Access) processor and the CPU, as well as having to make sure that when data arrives in the external memory, the CPU will not be looking at invalid data caches. This sort of memory mapping is often referred to as being consistent (sometimes coherent). In some architectures, a coherent mapping implies some clever arrangement where the data is always valid even if it is being cached. It is possible, with a memory management unit, to provide two (or more) virtual addresses for the same physical memory area. While one virtual address may access the data cache and not the physical memory location, another virtual address may reference the same physical memory location but bypass the data cache. Another complication in x86 systems is that some older pieces of hardware are unable to access physical memory areas greater than 16M (typically devices that live on the ISA bus). This forces the Linux memory allocator to have to allow a special memory option that restricts physical memory allocations to only the lower 16M of memory (GFP DMA (Direct Memory Access)). Since this restriction was also passed on to hardware devices used in other architectures, the memory allocation schemes in these architectures had to have similar options. The Microcontroller and DSP (Digital Signal Processor) fields are often free of such burdens, but, since the aim of the kernel driver programmer is to be architecture independent, you have to know about these “features”. Ok Whats a Bus address ? When the CPU (say with the MMU turned off) wants to access Physical memory, it puts that address on its output pins. This a Physical Address. When a peripheral device wants to access the same physical memory (as in a DMA (Direct Memory Access) function) it may have to use a different address to get to the same physical location. This is a Bus Address. So a Bus Address is the address used by a peripheral to access a certain Physical Address. Dynamic DMA Mapping Document: linux-kernel/Documentation/DMA (Direct Memory Access)-API (Application Programming Interface).txt, Linux Device Driver (3rd) - chapter 15. API (Application Programming Interface) definition: linux-kernel/include/linux/dma-mapping.h, linux-kernel/arch/blackfin/include/asm/dma-mapping.h, linux-kernel/arch/blackfin/kernel/bfin_dma_5xx.c DMA (Direct Memory Access) operations, in the end, come down to allocating a buffer and passing bus addresses to your device. A DMA (Direct Memory Access) mapping is a combination of allocating a DMA (Direct Memory Access) buffer and generating an address for that buffer that is accessible by the device. DMA (Direct Memory Access) mappings must also address the issue of cache coherency. Remember that modern processors keep copies of recently accessed memory areas in a fast, local cache; without this cache, reasonable performance is not possible. If your device changes an area of main memory, it is imperative that any processor caches covering that area be invalidated; otherwise the processor may work with an incorrect image of main memory, and data corruption results. Similarly, when your device uses DMA (Direct Memory Access) to read data from main memory, any changes to that memory residing in processor caches must be flushed out first. On Blackfin, there is not virtual address, also bus address and physical address are in the same memory space. DMA (Direct Memory Access) mapping is implemented in a simple way: A block of SDRAM (synchronous dynamic random access memory (system memory)) is reserved for DMA (Direct Memory Access) usage, which is configured as un-cacheable. dma_alloc_coherent() allocates DMA (Direct Memory Access) buffer from this memory region. In some case, if the DMA (Direct Memory Access) buffer is not allocated as “coherent”, the device driver needs to handle cache coherence itself. Before starting a DMA (Direct Memory Access) transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA (Direct Memory Access) transfer to memory, be sure to invalidate the cache over the destination address range for the same reason. If you are writing a portable device driver, be sure to used the generic DMA (Direct Memory Access) APIs (full list please refer to the document): void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp); void dma_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle); dma_addr_t dma_map_single(struct device *dev, void *ptr, size_t size, enum dma_data_direction dir) dma_addr_t dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir) int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir); Blackfin DMA The Blackfin processor offers a wide array of DMA (Direct Memory Access) capabilities. 12 Different DMA (Direct Memory Access) channels Memory to Memory and IO to Memory Channels transfers Dual X and Y indexing Address counters Simple Register DMA (Direct Memory Access) control Optional Sophisticated Descriptor Based Control 8, 16, or 32 bit data size Interrupt on each DMA (Direct Memory Access) packet completion Flexible DMA (Direct Memory Access) Priority Flow Types and Descriptor There are 5 different ways the DMA (Direct Memory Access) controller can be set up. These as called Flow types FLOW_STOP - Stop after the current job FLOW_AUTO - Autobuffer, Repeat the current transfer until stopped FLOW_ARRAY - Use a sequential list of descriptors FLOW_SMALL - Use a linked list of small descriptors FLOW_LARGE - Use a linked list of large descriptors The flow type can be defined in a CONFIG word in a descriptor so the modes can be mixed and the operation quite complex. Descriptors are used to control the DMA (Direct Memory Access) channel and allow a complex stream of data packet to be assembled if required. Array - Simple Sequential array of descriptors in memory Small Descriptor - the High address word does not change just the Low addr word is in the memory array Large Descriptor Both High and Low address words are in the memory array. So in the array case all the descriptors follow each other in memory. The CURR_DESC_PTR register must be set up and the DMA (Direct Memory Access) enabled. The only difference between th Small and Large Descriptor modes is the restriction that all the descriptors must reside in the same 64K memory area defined by the High Memory word. In these cases the NEXT_DESC_PTR is set up to point to the first descriptor in the array. One other slight complexity in the descriptor business is the fact the DMA (Direct Memory Access) controller does not have to read ALL of the words in the descriptor array. The NDSIZE part of the CONFIG Register contains the number of elements to read into the DMA (Direct Memory Access) controller for this operation. Descriptor Memory Layout Large/Small Descriptor: Example of Array Descriptor: 2-D DMA 2-D DMA (Direct Memory Access) can be roughly viewed as: /* Correct me if the boundary check is wrong */ for ( ; Y_COUNT > 1; Y_COUNT--) { for ( ; X_COUNT > 1; X_COUNT--) DMAx_CURR_ADDR += X_MODIFY; DMAx_CURR_ADDR += Y_MODIFY; } In some video application, 2-D DMA (Direct Memory Access) is more convenient to use than 1-D DMA (Direct Memory Access): Also in some case, if the buffer is too large and exceed the 16-bit X_COUNT (64K) limits, 2-D DMA (Direct Memory Access) can be used: MDMA Beside setting up MDMA (Memory Direct Memory Access) by yourself, there exists two ways to use MDMA (Memory Direct Memory Access): void *dma_memcpy(void *pdst, const void *psrc, size_t size) (refer to arch/blackfin/kernel/bfin_dma_5xx.c) Blackfin DMA (Direct Memory Access) driver, see bfin-dma DMA Pitfalls When customizing your DMA (Direct Memory Access) driver, there are a few pitfalls to be aware of Invalidating Cache vs Flushing Cache Flushing cache and invalidating cache sound similar, but they are very different animals. Flushing cache writes the current contents of the cache back to memory, whereas invalidating cache marks cache entries as invalid (thereby discarding the cached entries). Before starting a DMA (Direct Memory Access) transfer from memory, be sure to flush the cache over the source address range to prevent cache coherency problems. Similarly, before starting a DMA (Direct Memory Access) transfer to memory, be sure to invalidate the cache over the destination address range for the same reason. Otherwise very subtle bugs can be introduced to your driver. DMA Holes Do not work too close toDMAx_CURR_ADDR,DMAx_CURR_DESC_PTR, orDMAx_CURR_X_COUNT/DMAx_CURR_Y_COUNT There is a pipeline in the DMA (Direct Memory Access) transfer of approximately 10 data elements. So if you are polling any of the DMA (Direct Memory Access) current pointer registers to try and reduce latency, subtract off an offset of approximately 10 elements from the pointer that is returned from the register. Otherwise, when you go to examine the data, you will find there will be missing data where the DMA (Direct Memory Access) pointer seemingly incremented but no data was written. To see if this problem is occuring with you, write a known pattern to your DMA (Direct Memory Access) destination buffer right before enabling the transfer. After the transfer is complete, look through the buffer to see if your known pattern still resides in the buffer. If it does, then this problem is occuring to you. (See pg 5-57 of the BF537 HW reference manual for more info) DMA API Please refer to: arch/blackfin/include/asm/dma.h, and arch/blackfin/kernel/bin_dma_5xx.c. int request_dma(unsigned int channel, const char *device_id) void free_dma(unsigned int channel) void enable_dma(int channel) void disable_dma(int channel) The DMA (Direct Memory Access) API (Application Programming Interface) has been extended to allow for this increased flexibility. These APIs are blackfin specific: void set_dma_start_addr(unsigned int channel, unsigned long addr) void set_dma_next_desc_addr(unsigned int channel, unsigned long addr) void set_dma_x_count(unsigned int channel, unsigned short x_count) void set_dma_x_modify(unsigned int channel, short x_modify) void set_dma_y_count(unsigned int channel, unsigned short y_count) void set_dma_y_modify(unsigned int channel, short y_modify) void set_dma_config(unsigned int channel, unsigned short config) unsigned short set_bfin_dma_config(char direction, char flow_mode, char intr_mode, char dma_mode, char width) unsigned short get_dma_curr_irqstat(unsigned int channel) unsigned short get_dma_curr_xcount(unsigned int channel) unsigned short get_dma_curr_ycount(unsigned int channel) void set_dma_sg(unsigned int channel, struct dmasg_t *sg, int nr_sg) void dma_disable_irq(unsigned int channel) void dma_enable_irq(unsigned int channel) void clear_dma_irqstat(unsigned int channel) int set_dma_callback(unsigned int channel, dma_interrupt_t callback, void *data) Simple DMA Example This is a simple DMA (Direct Memory Access) example taken from the adsp-spidac.c driver. This is getting 8 bit data from the SPI (Serial Peripheral Interface) device int mybuffer. int mydmatest(void) { char mybuffer[1024 * 32]; int mysize = 1024 * 32; int ret; // lets ask for the DMA channel ret = request_dma(CH_SPI,"BF533 SPI Test"); if ( ret < 0 ) { printk(" Unable to get DMA channel\n"); return 1; } // turn off the DMA channel disable_dma(CH_SPI); // set up the dma config // WNR We are going to write to memory // RESTART throw away any old data in the fifo // Enable Interrupts set_dma_config(CH_SPI, ( WNR | RESTART | DI_EN )); // set address to drop data into set_dma_start_address(CH_SPI, (unsigned long)&mybuffer); // set the transfer size in bytes set_dma_x_count(CH_SPI,size); // set the X modify ( dont worry about Y for this one ) set_dma_x_modify(CH_SPI,1); // sync the cores up __builtin_bfin_ssync(); // off we go enable_dma(CH_SPI); Since this is going to use interrupts then the interrupt routine must be associated with the DMA (Direct Memory Access) channel. // set the IRQ callback set_dma_callback(CH_SPI, myirq, mydata); The IRQ (Interrupt request) routine could look like this. It simply clears the irq status. static irqreturn_t myirq( int irq, void *data) { unsigend short mystat; mystat = get_dma_curr_irqstat(CH_SPI); clear_dma_irqstat(CH_SPI); wake_up_interruptible(&mywaiting_task); return IRQ_HANDLED; }