world leader in high performance signal processing
Trace: » inline_assembly

Using In-line Assembly, Calling External Assembly from C

In-line assembly refers to placing assembly code within C code, this can be done in two ways: through the use of the asm construct, or by calling assembly code subroutines which are defined in a separate file. The asm construct method will be discussed first. The general format of the asm construct is given below:

asm( assembler template [: output operands : input operands : list of clobbered registers])


  asm ("nop;\n"
       : /* output operands */
       : /* input operands */
       : /* list of clobbered registers */ );
assembler template
This is the template from which the assembler code will be generated. The assembler instruction(s) given here must be enclosed in double quotes (” ”). Each instruction must be followed by a delimiter, either a newline (\n) or a semicolon. To access the output and input operands the following syntax is used: %0 (for the first operand), %1 (for the second operand),…etc.
output operands
This is a list of C expressions which can be written to by instructions in the assembler template. The format of an output operand is “constraint” (C expression). Each output operand must be separated by a comma.
input operands
This is a list of C expressions which can be read by the instructions in the assembler template. The format of an input operand is “constraint” (C expression). Each input operand must be separated by a comma.
list of clobbered registers
Any registers which are modified by the assembly instructions must be listed here. This will prevent the compiler from using these registers for other purposes. Each register in the list is enclosed in double quotes (” ”) and separated from other registers in the list by a comma.

The construct __asm__ may be used in place of asm if there is a naming conflict.

A summary of the most common constraints for input and output operands are given below:

  • m, memory operand - This constraint specifies that the variable's memory location should be used when referencing the operand. The value will not be copied to a register, instead all read and write operations will be preformed directly at the specified memory location. This frees up registers for other purposes.
  • r, register operand - This constraint specifies that the given operand should be stored in a general purpose register. This option should be used only when necessary or when it significantly speeds up execution.
  • [0,1,2,…], matching - This constraint is used when the same variable is to be used for both input and output. The number corresponding to the variable to be matched is used to specify a matching constraint (e.g. “0” means same as the first variable, “1” means same as the second variable, etc.).
  • =, write-only - This modifier specifies that the operand is write only. Any previously value is overwritten with the new value.

The gcc website maintains an exhaustive list of machine constraints. Just scroll down a few pages to the Blackfin family section.

Another consideration is whether or not the assembly code should be specified as volatile. If the code must remain in a certain location or should not otherwise be optimized by the compiler the volatile construct should be used after the asm construct (i.e. asm volatile (…) ).

Some examples of in-line assembly are given below:

int x =10; 
int y; 
asm( "%0 = %1": "=r"(y) : "m"(x) );

This example simply makes the assignment y=x. The variable y is specified as a write-only, register constrained, output operand. The variable x is a memory constrained input operand. Since the variable y is first in the operand list, it is specified in the assembler template with %0, %1 corresponds to x. Neglecting complier optimization, the generated assembly code will load the value of x directly from its memory location into a general purpose register corresponding to y. The value of this general purpose register will then be copied to y. A list of clobbered registers is not required as the registers used are selected by the complier and not specified by the assembler template.

int x =10; 
int y; 
asm( "%0 = %1": "=r"(y) : "r"(x) );

This example is the same as the one above except that x is now a register constrained input operand. Neglecting complier optimization, the generated assembly code will copy the value of x to a general purpose register, load the value of this general purpose register into a second general purpose register corresponding to y, then copy the value of this second general purpose register to y. Again, a list of clobbered registers is not required as the registers used are selected by the complier and not specified by the assembler template.

int x=0x00FF; 
asm( "%0 = %1 << 4" : "=r"(x) : "0"(x) );

This is an example of the matching constraint. Here x is a register constrained output operand as well as an input operand. The “0” constraint specifies that x, in the input operand list, is the same as the first variable (i.e. x in the output operand list). This piece of assembler code simply shifts x left 4 bits producing the output x=0x0FF0.

int x=10; 
int y; 
asm( "R0 = %1;R1 = R0;%0 = R1" : "=r"(y) : "m"(x) : "R0","R1" );

This example demonstrates the need for the clobbered register list. Neglecting complier optimization, this piece of assembly code copies the value of x to R0 then copies this to R1, finally writing R1 to y. In this case the assembler template changes the value of registers R0 and R1 so they must be included in the clobbered register list.

Many more features are available for use with the asm construct. For more information on this topic see: GCC-Inline-Assembly-HOWTO.

The other method of including assembly instructions within C code is to call assembly code subroutines which are defined in a separate file, an example of this is given below.

To call a simple assembly subroutine which adds two numbers together and returns the result, C code similar to the following could be used:

extern int addition(int,int);
int main(void)
int result;
result = addition(1,2);
printf("Result = %i\n",result);
return 0;

The extern keyword tells the compiler that the addition subroutine is defined in another file. This file would contain assembly instructions similar to the following:

.global _addition;
 R0 =R0+R1;

.global is an assembler directive, it makes the symbol _addition visible to the linker so that it may be linked to the externally defined function addition in the C code. Assembler directives always begin with a period.

The underscore in front of the function name addition is necessary because the compiler prefixes an underscore to user defined procedures when it is generating the object file, this process is known as mangling.

The way in which data is passed into and returned from the subroutine depends on the data types and the number of arguments. In this case the first argument to the function is passed in through register R0, the second argument is passed in through register R1, and the result is returned through register R0. To observer how data will be passed into your function you can view the assembly code that the compiler generates by calling compiler with the -S switch as follows:

bash$ bin-elf-gcc -S <C code source file(s)>

This will generate the assembly code for your program and write it to a file with the *.s extension. Examination of this file will reveal how parameters will be passed into your function.

When including assembly code from a separate file you must insure that any special registers that are altered by your assembly code are first saved and then restored. If this is not done unwanted side effects may be introduced into your program. Also, when compiling your program you must pass the name of the assembly code file to the compiler along with any other files your program is dependent on. For example a program with the C code source file myprog.c and the assembly code file myprog.s may be compiled with the following command:

bash$ bin-elf-gcc -o myprog myprog.c myprog.s

Some more complex examples of assembly subroutines will now be given.

Consider a function which adds five numbers together with the C definition:

extern int addition(int,int,int,int,int);

Examination of the assembly code generated when calling such a function shows that the first three parameters are stored in registers R0, R1, and R2. The last two parameters are pushed onto the stack. The assembly code used to add these five numbers together might look something like the following:

.global _addition;
 [--sp] = R7;
 R3 =[FP +8];
 R7 =[FP +12];
 R0 =R0 +R1 ;
 R0 =R0 +R2 ;
 R0 =R0 +R3 ;
 R0 =R0 +R7 ;
 R7 = [sp++];

First, we will be retrieving variables from the stack so LINK is called to allocate frame space on the stack. Next, because we are modifying registers other than R0-R3, we must save these registers by pushing them onto the stack. In this case we only need to push R7. Next the last two parameters are retrieved from the stack by moving up from our frame pointer. The first 8 byte addition to the frame pointer moves us past the saved prior frame pointer and the saved RETS to the second last parameter. Another 4 bytes past this gives us the last parameter. These parameters are saved to registers R3 and R7 then added to the parameters saved in registers R0-R2. The result is returned through register R0. Register R7 is restored from the stack then our stack frame is de-allocated using the UNLINK instruction. RTS returns us from the subroutine. The positions of the frame pointer and the stack pointer as well as the contents of the stack are illustrated below, each cell represents 4 bytes:

Now consider a function where the values of two parameters need to be swapped. The parameters are passed in by address instead of by value. The C definition of the function is as follows:

void swap(int* v1, int* v2)
   int tmp;
   tmp = *v1;
   *v1 = *v2;
   *v2 = tmp;

The assembly version is as follows

.global _swap;
 P2 = R0;
 P1 = R1;
 R1 = [P2];
 R0 = [P1];
 [P2] = R0;
 [P1] = R1;

bfin-elf-gcc Parameter Passing Convention/Runtime Model

Firstly test your code using the -S option for *gcc*. This will allow you to see some example code.

If you want to write an assember function, start with a dummy C function and inspect the assembly output.

The once the API has been confirmed with a dummy function move on to the real assembly code.

A called function may clobber all registers, except RETS, R4-R7, P3-P5, FP and SP. It may also modify the first 12 bytes in the caller's stack frame which is used as an argument area for the first three arguments (which are passed in R0…R3 but may be placed on the stack by the called function).

A called function may assume that all L registers are zero when entered, and must ensure that they remain zero when it returns.

.global _stacktest;
  [--sp] = (r7:6)
 LINK 0;
 R7 = R0;
 R6 = R1 + R2;
 R0 = R1 + R7 + R6;
 (r7:6) = [sp++];