world leader in high performance signal processing
Trace: » exceptions


Basic Exception description

The Linux system runs on exceptions. If this would not true it would act like a brick. Interrupts and exceptions are the way the system responds to real world events including the timer tick. If you are interested in interrupts - (responding to real world events), please see the interrupts page.

Exceptions are synchronous to the instruction stream. In other words, a particular instruction causes an exception when it attempts to finish execution. No instructions after the offending instruction are executed before the exception handler takes effect.

Many of the exceptions are memory related. For example, an exception is given when a misaligned access is attempted, or when a cacheability protection lookaside buffer (CPLB) miss or protection violation occurs. Exceptions are also given when illegal instructions or illegal combinations of registers are executed.

All excepting instruction which are caused by errors will not be executed before the exception event is taken. (The write of a protection violation will not occur).

Linux use of Blackfin Exceptions

Various types of exceptions are managed by the Linux kernel, all listed in :

file: arch/blackfin/mach-common/entry.S

scm failed with exit code 1:
file does not exist in git

You can clearly see:

  • that the instruction EXCPT 0 is how Linux syscalls are done.
  • that the instruction EXCPT 5 to EXCPT 15 are available for you to use. You can think of these as a fast path software interrupt, which can be executed from userspace. (since the RAISE n instruction is a supervisor only instruction).
  • all possible error conditions are already handled by the Linux kernel, and you should not touch them if you want anything to actual work properly :) There is a little bit of voodoo in the entry.S file, and it is very easy to break things in very subtle ways that may go unnoticed for months.

Custom Exception handlers

To install a custom exception handler, there are two helper functions:

file: arch/blackfin/kernel/traps.c

scm failed with exit code 1:
file does not exist in git

file: arch/blackfin/kernel/traps.c

scm failed with exit code 1:
file does not exist in git

These allow you to install any kernel function as an exception handler. While this is a nice fast way to get to supervisor mode, there are many restrictions.

  • exceptions have their own stack (you can not pass things to an exception handler with the standard ABI - all exception handlers should be written in assembly).
  • only 4 registers are pushed on to the exception stack by default. Modify anything else, and things will eventually crash. These 5 registers are: ASTAT, & R7, R6, P5 and P4. If you use anything else, you must save restore it yourself. (all exception handlers should be written in assembly).
  • you are limited to a small stack - 4 kbytes
  • you can not cause another exception while operating in exception space. This means things like data or instruction CPLB Miss or protection violations will cause a double fault (an unrecoverable event, see below). (all exceptions can not access any memory which is not locked into the CPLB tables (only kernel memory), and if the exception handler is installed as a kernel module, it must be in L1 memory.
  • since these are running in exception space - they can not be debugged with kgdb - it requires a JTAG debugger to single step through this code.
  • you must jump to ”JUMP.L _ret_from_exception;” to return from the exception handler to make sure errors are checked, and the stack is restored properly (write your entry/exit function in assembly).



Double Faults

If any exception occurs in an event handler that is already servicing an Exception, NMI, Reset, or Emulation event, this will trigger a what is known as a double fault condition. This is a nonrecoverable state in the Blackfin hardware.

To help debug issues like this, when the Linux kernel causes a double fault, it will immediately reset, and during the next kernel boot, will print out some information about the faulting events which caused the double fault condition in the previous kernel (so you can do something about it).

There is NO way to connect a debugger (kgdb or JTAG) to help debug a double fault condition - once it occurs, it is too late to do anything except reset.

The problem with doublefaults is that the default operation of the hardware is just basically skip the instruction. The details are:

  • The double excepting instruction is not committed. All writebacks from the instruction are prevented.
  • The generated exception is not taken.
  • The EXCAUSE field in the SEQSTAT register is updated with an unrecoverable event code.
  • The address of the offending instruction is saved in the RETX register (where the return address for the first exception is being processed). Note if the processor were executing, for example, the NMI handler, the RETN register would not have been updated; the excepting instruction address is always stored in RETX.

So, unless you are specifically checking for it (and there is a kernel option to do that - CONFIG_DEBUG_DOUBLEFAULT), the double fault instruction will be skipped (causing unknown havok in your logic), and the exception return address (RETX) gets clobbered with the instruction that causes the double fault - so if your first exception routine is lucky enough to finish having skipped unknown number of instructions, it will return to a faulting instruction.

This can present itself in a variety of ways - from a infinite loops (where eventually the watchdog may goes off), to random kernel panics, to double fault messages (if you are lucky).

Since that would just suck, and since it is a unrecoverable error - there is an option in the hardware that the kernel takes advantage of to cause a RESET when a double fault occurs (CONFIG_DEBUG_DOUBLEFAULT_RESET).

Kernel message


Kernel hacking  --->
  [*] Debug Double Faults

is turned on, you have two choices:

  ( ) Print
  ( ) Reset

- one cause a hard RESET, the other causes a kernel panic - so both eventually have the same result - reload the image, and boot again. The difference is that during the Print version, userspace maps which may have caused the issue to occur are still in memory, so you can get more symbolic information of what is going on. When a RESET occurs, the information prints out from the recovered kernel - so you may know the physical address, but not what belongs there…

Print Example

RESET Example