world leader in high performance signal processing
Trace: » scheduler

The 2.6 Kernel Scheduler

The 2.6 Kernel introduces some new (to Linux) scheduling concepts. In a quest to improve system performance and responsiveness several kernel systems have been reworked and improved.

Scheduler Basics

The system tasks are organized into a number of queues. The queues are ordered by priority and goodness. Each queue is assigned to a single bit in a 32 bit word. The first bit set in the word represents the first queue to run. The first task on the selected queue is the next to run in that queue.

Since there are more than 32 possible queues, a high order word is assigned to monitor 32 low order words. A bit set in the high order word means that one or more queues in the low order word have tasks ready to run.

The task selection process then boils down to two bit searches.

   high order work - index of low order words
   low order word -  index of task queues by priority
*  find the first not zero bit in the high order word
     this gives the index of the low order word assigned to the queues
     index 0 for queues 0 to 31
     index 1 for queues 32 to 63  etc
* find the first non zero bit in the low order word.
    gives the index of the run queue with the next active task.
* run the first task in the selected queue's active list.

Time Slices

A task is given a time slice. Once the time slice is used up the task will not be rescheduled until all other tasks in the run list have used their time slice.

Each run queue has an active list and an expired list. When a task has used up its CPU time slice it is moved from the active to the expired list.

When there are no more active tasks the active and expired lists are swapped.

  Min     Timeslice   the larger of 1 jiffie or 5 mSecs
  Max     Timeslice   800 mSecs
  Default Timeslice   100 mSecs

Goodness and Static Priority

Linux has a concept of goodness associated with tasks at user priority 0.

If a task does not use up its time slice it must have decided to sleep waiting to be woken up by I/O activity or another task. The goodness value will provide preference to tasks that seem to execute quickly. This is done by defining a static priority and an adjustment range for that priority. The static priority is in the range of 0…39 with an adjustment of -5 to +5. (NOTE: the adjustment does not go beyond the 0…39 limits).

Keep in mind that this priority is how the kernel itself represents the absolute priority values of tasks. The user-nice value is applied to adjust the actual kernel priority representation.

Real Time Tasks

Privileged users can assign a task a real time priority greater than 0.

These tasks are not tuned by the system, their relative priorities are not changed by the scheduler.

The number of Real Time priorities is set at compile time. The default range is 1..99. In addition the kernel has a small number of even higher priority task options for its own use.

Real time tasks can also be defined as round robin or FIFO. A round robin task will be assigned a timeslice. When it has used the timeslice up it will be rescheduled to allow other tasks to execute at the same (or higher) priority.

A FIFO task will execute until its completion unless preempted by a higher priority task.

Kernel Preemption

In the old days When a Unix task was interrupted while executing in kernel mode the system would restore that task's execution until the task attempted to return to user space. At that time the scheduler would be revisited to possibly schedule another task.

This meant that once you got into kernel space. as a result of a system call,the CPU was yours until your system call was finished.

The preemptible kernel patch broke this behavior and allowed the kernel to reschedule the system at any return from exception. This behavior can be disabled when needed but the concept remains that preemption is on by default at all times.

Preemption means that device drivers that presumed atomic behavior could possibly be broken. On Multi Processor systems you could even switch the CPU you are running on due to a preemption event. In most cases this is not a problem. If in doubt, temporarily disable preemption in critical code areas.

The O1 scheduler

Originally coded by Ingo Molnar in 2002 the O1 Scheduler adds a layer of common sense to the kernel scheduling policy.

One of the keys to the new scheduler is the use of many queues to manage task scheduling one queue is provided for each priority level including the dynamic priorities defined for the non real time priority 0.

The scheduler is coded in uClinux-dist/linux-2.6.x/kernel/sched.c and snippets are shown here for understanding.

The priority array structure is shown here.

file: kernel/sched.c

scm failed with exit code 1:
file does not exist in git

file: kernel/sched.c

scm failed with exit code 1:
file does not exist in git

These structures are used in the actual schedule function.

        idx = sched_find_first_bit(array->bitmap);
        queue = array->queue + idx;
        next = list_entry(queue->next, task_t, run_list);

The dequeue and enqueue functions are used to remove and add tasks to a given priority array.

file: kernel/sched.c

scm failed with exit code 1:
file does not exist in git

file: kernel/sched.c

scm failed with exit code 1:
file does not exist in git

Scheduler Data Structures

To see what processes are in the list of active processes can be done with kgdb or gdb via JTAG. Examine the active and expired list members of the runqueues structure.