world leader in high performance signal processing
Trace: » kernel-api

The Linux Kernel API

Data Types

Doubly Linked Lists

Basic C Library Functions

When writing drivers, you cannot in general use routines which are from the C Library. Some of the functions have been found generally useful and they are listed below. The behaviour of these functions may vary slightly from those defined by ANSI, and these deviations are noted in the text.

String Conversions

String Manipulation

Bit Operations

Basic Kernel Library Functions

The Linux kernel provides more basic utility functions.

Bitmap Operations

Command-line Parsing

CRC Functions

Memory Management in Linux

The Slab Cache

User Space Memory Access

More Memory Management Functions

Kernel IPC facilities

IPC utilities

FIFO Buffer

kfifo interface

relay interface support

Relay interface support is designed to provide an efficient mechanism for tools and facilities to relay large amounts of data from kernel space to user space.

relay interface

Module Support

Module Loading

Inter Module support

Refer to the file kernel/module.c for more information.

Hardware Interfaces

Interrupt Handling

DMA Channels

Resources Management

MTRR Handling

PCI Support Library

PCI Hotplug Support Library

MCA Architecture

Firmware Interfaces

DMI Interfaces

EDD Interfaces

Security Framework

  • security_init - initializes the security framework

Synopsis:

int security_init ( void )

Arguments:

  • void - no arguments

Description:

This should be called early in the kernel initialization sequence.

  • security_module_enable - Load given security module on boot ?

Synopsis:

int security_module_enable ( struct security_operations * ops )

Arguments:

  • ops - a pointer to the struct security_operations that is to be checked.

Description:

Each LSM must pass this method before registering its own operations to avoid security registration races. This method may also be used to check if your LSM is currently loaded during kernel initialization.

Return true if:

-The passed LSM is the one chosen by user at boot time, -or user didn't specify a specific LSM and we're the first to ask for registration permission, -or the passed LSM is currently loaded. Otherwise, return false.

  • register_security - registers a security framework with the kernel

Synopsis:

int register_security ( struct security_operations * ops )

Arguments:

  • ops - a pointer to the struct security_options that is to be registered

Description:

This function allows a security module to register itself with the kernel security subsystem. Some rudimentary checking is done on the ops value passed to this function. You'll need to check first if your LSM is allowed to register its ops by calling security_module_enable(ops).

If there is already a security module registered with the kernel, an error will be returned. Otherwise 0 is returned on success.

  • securityfs_create_file - create a file in the securityfs filesystem

Synopsis:

struct dentry * securityfs_create_file ( const char * name )

Arguments:

  • name - a pointer to a string containing the name of the file to create.
  • mode - the permission that the file should have
  • parent - a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the file will be created in the root of the securityfs filesystem.
  • data - a pointer to something that the caller will want to get to later on. The inode.i_private pointer will point to this value on the open call.
  • fops - a pointer to a struct file_operations that should be used for this file.

Description:

This is the basic create a file function for securityfs. It allows for a wide range of flexibility in creating a file, or a directory (if you want to create a directory, the securityfs_create_dir function is recommended to be used instead).

This function returns a pointer to a dentry if it succeeds. This pointer must be passed to the securityfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here). If an error occurs, the function will return the erorr value (via ERR_PTR).

If securityfs is not enabled in the kernel, the value -ENODEV is returned.

  • securityfs_create_dir - create a directory in the securityfs filesystem

Synopsis:

struct dentry * securityfs_create_dir ( const char * name )

Arguments:

  • name - a pointer to a string containing the name of the directory to create.
  • parent - a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the directory will be created in the root of the securityfs filesystem.

Description:

This function creates a directory in securityfs with the given name.

This function returns a pointer to a dentry if it succeeds. This pointer must be passed to the securityfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here). If an error occurs, NULL will be returned.

If securityfs is not enabled in the kernel, the value -ENODEV is returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code.

  • securityfs_remove - removes a file or directory from the securityfs filesystem

Synopsis:

void securityfs_remove ( struct dentry * dentry )

Arguments:

  • dentry - a pointer to a the dentry of the file or directory to be removed.

Description:

This function removes a file or directory in securityfs that was previously created with a call to another securityfs function (like securityfs_create_file or variants thereof.)

This function is required to be called in order for the file to be removed. No automatic cleanup of files will happen when a module is removed; you are responsible here.

Audit Interfaces

  • audit_log_start - obtain an audit buffer

Synopsis:

struct audit_buffer * audit_log_start ( struct audit_context * ctx )

Arguments:

  • ctx - audit_context (may be NULL)
  • gfp_mask - type of allocation
  • type - audit message type

Description:

Returns audit_buffer pointer on success or NULL on error.

Obtain an audit buffer. This routine does locking to obtain the audit buffer, but then no locking is required for calls to audit_log_*format. If the task (ctx) is a task that is currently in a syscall, then the syscall is marked as auditable and an audit record will be written at syscall exit. If there is no associated task, then task context (ctx) should be NULL.

  • audit_log_format - format a message into the audit buffer.

Synopsis:

void audit_log_format ( struct audit_buffer * ab )

Arguments:

  • ab - audit_buffer
  • fmt - format string @…: optional parameters matching fmt string
  • - variable arguments

Description:

All the work is done in audit_log_vformat.

  • audit_log_end - end one audit record

Synopsis:

void audit_log_end ( struct audit_buffer * ab )

Arguments:

  • ab - the audit_buffer

Description:

The netlink_* functions cannot be called inside an irq context, so the audit buffer is placed on a queue and a tasklet is scheduled to remove them from the queue outside the irq context. May be called in any context.

  • audit_log - Log an audit record

Synopsis:

void audit_log ( struct audit_context * ctx )

Arguments:

  • ctx - audit context
  • gfp_mask - type of allocation
  • type - audit message type
  • fmt - format string to use @…: variable parameters matching the format string
  • - variable arguments

Description:

This is a convenience function that calls audit_log_start, audit_log_vformat, and audit_log_end. It may be called in any context.

  • audit_alloc - allocate an audit context block for a task

Synopsis:

int audit_alloc ( struct task_struct * tsk )

Arguments:

  • tsk - task

Description:

Filter on the task information and allocate a per-task audit context if necessary. Doing so turns on system call auditing for the specified task. This is called from copy_process, so no lock is needed.

  • audit_free - free a per-task audit context

Synopsis:

void audit_free ( struct task_struct * tsk )

Arguments:

  • tsk - task whose audit context block to free

Description:

Called from copy_process and do_exit

  • audit_syscall_entry - fill in an audit record at syscall entry

Synopsis:

void audit_syscall_entry ( int arch )

Arguments:

  • arch - architecture type
  • major - major syscall type (function)
  • a1 - additional syscall register 1
  • a2 - additional syscall register 2
  • a3 - additional syscall register 3
  • a4 - additional syscall register 4

Description:

Fill in audit context at syscall entry. This only happens if the audit context was created when the task was created and the state or filters demand the audit context be built. If the state from the per-task filter or from the per-syscall filter is AUDIT_RECORD_CONTEXT, then the record will be written at syscall exit time (otherwise, it will only be written if another part of the kernel requests that it be written).

  • audit_syscall_exit - deallocate audit context after a system call

Synopsis:

void audit_syscall_exit ( int valid )

Arguments:

  • valid - success/failure flag
  • return_code - syscall return value

Description:

Tear down after system call. If the audit context has been marked as auditable (either because of the AUDIT_RECORD_CONTEXT state from filtering, or because some other part of the kernel write an audit message), then write out the syscall information. In call cases, free the names stored from getname.

  • __audit_getname - add a name to the list

Synopsis:

void __audit_getname ( const char * name )

Arguments:

  • name - name to add

Description:

Add a name to the list of audit names for this context. Called from fs/namei.c:getname.

  • __audit_inode - store the inode and device from a lookup

Synopsis:

void __audit_inode ( const char * name )

Arguments:

  • name - name being audited
  • dentry - dentry being audited

Description:

Called from fs/namei.c:path_lookup.

  • auditsc_get_stamp - get local copies of audit_context values

Synopsis:

int auditsc_get_stamp ( struct audit_context * ctx )

Arguments:

  • ctx - audit_context for the task
  • t - timespec to store time recorded in the audit_context
  • serial - serial value that is recorded in the audit_context

Description:

Also sets the context as auditable.

  • audit_set_loginuid - set a task's audit_context loginuid

Synopsis:

int audit_set_loginuid ( struct task_struct * task )

Arguments:

  • task - task whose audit context is being modified
  • loginuid - loginuid value

Description:

Returns 0.

Called (set) from fs/proc/base.c::proc_loginuid_write.

  • __audit_mq_open - record audit data for a POSIX MQ open

Synopsis:

void __audit_mq_open ( int oflag )

Arguments:

  • oflag - open flag
  • mode - mode bits
  • attr - queue attributes
  • __audit_mq_sendrecv - record audit data for a POSIX MQ timed send/receive

Synopsis:

void __audit_mq_sendrecv ( mqd_t mqdes )

Arguments:

  • mqdes - MQ descriptor
  • msg_len - Message length
  • msg_prio - Message priority
  • abs_timeout - Message timeout in absolute time
  • __audit_mq_notify - record audit data for a POSIX MQ notify

Synopsis:

void __audit_mq_notify ( mqd_t mqdes )

Arguments:

  • mqdes - MQ descriptor
  • notification - Notification event
  • __audit_mq_getsetattr - record audit data for a POSIX MQ get/set attribute

Synopsis:

void __audit_mq_getsetattr ( mqd_t mqdes )

Arguments:

  • mqdes - MQ descriptor
  • mqstat - MQ flags
  • __audit_ipc_obj - record audit data for ipc object

Synopsis:

void __audit_ipc_obj ( struct kern_ipc_perm * ipcp )

Arguments:

  • ipcp - ipc permissions
  • __audit_ipc_set_perm - record audit data for new ipc permissions

Synopsis:

void __audit_ipc_set_perm ( unsigned long qbytes )

Arguments:

  • qbytes - msgq bytes
  • uid - msgq user id
  • gid - msgq group id
  • mode - msgq mode (permissions)

Description:

Called only after audit_ipc_obj.

  • audit_socketcall - record audit data for sys_socketcall

Synopsis:

void audit_socketcall ( int nargs )

Arguments:

  • nargs - number of args
  • args - args array
  • __audit_fd_pair - record audit data for pipe and socketpair

Synopsis:

void __audit_fd_pair ( int fd1 )

Arguments:

  • fd1 - the first file descriptor
  • fd2 - the second file descriptor
  • audit_sockaddr - record audit data for sys_bind, sys_connect, sys_sendto

Synopsis:

int audit_sockaddr ( int len )

Arguments:

  • len - data length in user space
  • a - data address in kernel space

Description:

Returns 0 for success or NULL context or < 0 on error.

  • __audit_signal_info - record signal info for shutting down audit subsystem

Synopsis:

int __audit_signal_info ( int sig )

Arguments:

  • sig - signal value
  • t - task being signaled

Description:

If the audit subsystem is being terminated, record the task (pid) and uid that is doing that.

  • __audit_log_bprm_fcaps - store information about a loading bprm and relevant fcaps

Synopsis:

int __audit_log_bprm_fcaps ( struct linux_binprm * bprm )

Arguments:

  • bprm - pointer to the bprm being processed
  • new - the proposed new credentials
  • old - the old credentials

Description:

Simply check if the proc already has the caps given by the file and if not store the priv escalation info for later auditing at the end of the syscall

-Eric

  • __audit_log_capset - store information about the arguments to the capset syscall

Synopsis:

void __audit_log_capset ( pid_t pid )

Arguments:

  • pid - target pid of the capset call
  • new - the new credentials
  • old - the old (current) credentials

Description:

Record the aguments userspace sent to sys_capset for later printing by the audit system if applicable

  • audit_core_dumps - record information about processes that end abnormally

Synopsis:

void audit_core_dumps ( long signr )

Arguments:

  • signr - signal value

Description:

If a process ends with a core dump, something fishy is going on and we should record the event for investigation.

  • audit_receive_filter - apply all rules to the specified message type

Synopsis:

int audit_receive_filter ( int type )

Arguments:

  • type - audit message type
  • pid - target pid for netlink audit messages
  • uid - target uid for netlink audit messages
  • seq - netlink audit message sequence (serial) number
  • data - payload data
  • datasz - size of payload data
  • loginuid - loginuid of sender
  • sessionid - sessionid for netlink audit message
  • sid - SE Linux Security ID of sender

Accounting Framework

  • sys_acct - enable/disable process accounting

Synopsis:

long sys_acct ( const char __user * name )

Arguments:

  • name - file name for accounting records or NULL to shutdown accounting

Description:

Returns 0 for success or negative errno values for failure.

sys_acct is the only system call needed to implement process accounting. It takes the name of the file where accounting records should be written. If the filename is NULL, accounting will be shutdown.

  • acct_auto_close_mnt - turn off a filesystem's accounting if it is on

Synopsis:

void acct_auto_close_mnt ( struct vfsmount * m )

Arguments:

  • m - vfsmount being shut down

Description:

If the accounting is turned on for a file in the subtree pointed to to by m, turn accounting off. Done when m is about to die.

  • acct_auto_close - turn off a filesystem's accounting if it is on

Synopsis:

void acct_auto_close ( struct super_block * sb )

Arguments:

  • sb - super block for the filesystem

Description:

If the accounting is turned on for a file in the filesystem pointed to by sb, turn accounting off.

  • acct_init_pacct - initialize a new pacct_struct

Synopsis:

void acct_init_pacct ( struct pacct_struct * pacct )

Arguments:

  • pacct - per-process accounting info struct to initialize
  • acct_collect - collect accounting information into pacct_struct

Synopsis:

void acct_collect ( long exitcode )

Arguments:

  • exitcode - task exit code
  • group_dead - not 0, if this thread is the last one in the process.
  • acct_process - now just a wrapper around acct_process_in_ns, which in turn is a wrapper around do_acct_process.

Synopsis:

void acct_process ( void )

Arguments:

  • void - no arguments

Description:

handles process accounting for an exiting task

Block Devices

  • blk_get_backing_dev_info - get the address of a queue's backing_dev_info

Synopsis:

struct backing_dev_info * blk_get_backing_dev_info ( struct block_device * bdev )

Arguments:

  • bdev - device

Description:

Locates the passed device's request queue and returns the address of its backing_dev_info

Will return NULL if the request queue cannot be located.

  • blk_plug_device_unlocked - plug a device without queue lock held

Synopsis:

void blk_plug_device_unlocked ( struct request_queue * q )

Arguments:

  • q - The struct request_queue to plug

Description:

Like blk_plug_device(), but grabs the queue lock and disables interrupts.

  • generic_unplug_device - fire a request queue

Synopsis:

void generic_unplug_device ( struct request_queue * q )

Arguments:

  • q - The struct request_queue in question

Description:

Linux uses plugging to build bigger requests queues before letting the device have at them. If a queue is plugged, the I/O scheduler is still adding and merging requests on the queue. Once the queue gets unplugged, the request_fn defined for the queue is invoked and transfers started.

  • blk_start_queue - restart a previously stopped queue

Synopsis:

void blk_start_queue ( struct request_queue * q )

Arguments:

  • q - The struct request_queue in question

Description:

blk_start_queue will clear the stop flag on the queue, and call the request_fn for the queue if it was in a stopped state when entered. Also see blk_stop_queue. Queue lock must be held.

  • blk_stop_queue - stop a queue

Synopsis:

void blk_stop_queue ( struct request_queue * q )

Arguments:

  • q - The struct request_queue in question

Description:

The Linux block layer assumes that a block driver will consume all entries on the request queue when the request_fn strategy is called. Often this will not happen, because of hardware limitations (queue depth settings). If a device driver gets a 'queue full' response, or if it simply chooses not to queue more I/O at one point, it can call this function to prevent the request_fn from being called until the driver has signalled it's ready to go again. This happens by calling blk_start_queue to restart queue operations. Queue lock must be held.

  • blk_sync_queue - cancel any pending callbacks on a queue

Synopsis:

void blk_sync_queue ( struct request_queue * q )

Arguments:

  • q - the queue

Description:

The block layer may perform asynchronous callback activity on a queue, such as calling the unplug function after a timeout. A block device may call blk_sync_queue to ensure that any such activity is cancelled, thus allowing it to release resources that the callbacks might use. The caller must already have made sure that its →make_request_fn will not re-add plugging prior to calling this function.

  • __blk_run_queue - run a single device queue

Synopsis:

void __blk_run_queue ( struct request_queue * q )

Arguments:

  • q - The queue to run

Description:

See blk_run_queue. This variant must be called with the queue lock held and interrupts disabled.

  • blk_run_queue - run a single device queue

Synopsis:

void blk_run_queue ( struct request_queue * q )

Arguments:

  • q - The queue to run

Description:

Invoke request handling on this queue, if it has pending work to do. May be used to restart queueing when a request has completed.

  • blk_init_queue - prepare a request queue for use with a block device

Synopsis:

struct request_queue * blk_init_queue ( request_fn_proc * rfn )

Arguments:

  • rfn - The function to be called to process requests that have been placed on the queue.
  • lock - Request queue spin lock

Description:

If a block device wishes to use the standard request handling procedures, which sorts requests and coalesces adjacent requests, then it must call blk_init_queue. The function rfn will be called when there are requests on the queue that need to be processed. If the device supports plugging, then rfn may not be called immediately when requests are available on the queue, but may be called at some time later instead. Plugged queues are generally unplugged when a buffer belonging to one of the requests on the queue is needed, or due to memory pressure.

rfn is not required, or even expected, to remove all requests off the queue, but only as many as it can handle at a time. If it does leave requests on the queue, it is responsible for arranging that the requests get dealt with eventually.

The queue spin lock must be held while manipulating the requests on the request queue; this lock will be taken also from interrupt context, so irq disabling is needed for it.

Function returns a pointer to the initialized request queue, or NULL if it didn't succeed.

Note:

blk_init_queue must be paired with a blk_cleanup_queue call when the block device is deactivated (such as at module unload).

  • blk_make_request - given a bio, allocate a corresponding struct request.

Synopsis:

struct request * blk_make_request ( struct request_queue * q )

Arguments:

  • q - target request queue
  • bio - The bio describing the memory mappings that will be submitted for IO. It may be a chained-bio properly constructed by block/bio layer.
  • gfp_mask - gfp flags to be used for memory allocation

Description:

blk_make_request is the parallel of generic_make_request for BLOCK_PC type commands. Where the struct request needs to be farther initialized by the caller. It is passed a struct bio, which describes the memory info of the I/O transfer.

The caller of blk_make_request must make sure that bi_io_vec are set to describe the memory buffers. That bio_data_dir will return the needed direction of the request. (And all bio's in the passed bio-chain are properly set accordingly)

If called under none-sleepable conditions, mapped bio buffers must not need bouncing, by calling the appropriate masked or flagged allocator, suitable for the target device. Otherwise the call to blk_queue_bounce will BUG.

WARNING:

When allocating/cloning a bio-chain, careful consideration should be given to how you allocate bios. In particular, you cannot use __GFP_WAIT for anything but the first bio in the chain. Otherwise you risk waiting for IO completion of a bio that hasn't been submitted yet, thus resulting in a deadlock. Alternatively bios should be allocated using bio_kmalloc instead of bio_alloc, as that avoids the mempool deadlock. If possible a big IO should be split into smaller parts when allocation fails. Partial allocation should not be an error, or you risk a live-lock.

  • blk_requeue_request - put a request back on queue

Synopsis:

void blk_requeue_request ( struct request_queue * q )

Arguments:

  • q - request queue where request should be inserted
  • rq - request to be inserted

Description:

Drivers often keep queueing requests until the hardware cannot accept more, when that condition happens we need to put the request back on the queue. Must be called with queue lock held.

  • blk_insert_request - insert a special request into a request queue

Synopsis:

void blk_insert_request ( struct request_queue * q )

Arguments:

  • q - request queue where request should be inserted
  • rq - request to be inserted
  • at_head - insert request at head or tail of queue
  • data - private data

Description:

Many block devices need to execute commands asynchronously, so they don't block the whole kernel from preemption during request execution. This is accomplished normally by inserting aritficial requests tagged as REQ_TYPE_SPECIAL in to the corresponding request queue, and letting them be scheduled for actual execution by the request queue.

We have the option of inserting the head or the tail of the queue. Typically we use the tail for new ioctls and so forth. We use the head of the queue for things like a QUEUE_FULL message from a device, or a host that is unable to accept a particular command.

  • part_round_stats - Round off the performance stats on a struct disk_stats.

Synopsis:

void part_round_stats ( int cpu )

Arguments:

  • cpu - cpu number for stats access
  • part - target partition

Description:

The average IO queue length and utilisation statistics are maintained by observing the current state of the queue length and the amount of time it has been in this state for.

Normally, that accounting is done on IO completion, but that can result in more than a second's worth of IO being accounted for within any one second, leading to >100% utilisation. To deal with that, we call this function to do a round-off before returning the results when reading /proc/diskstats. This accounts immediately for all queue usage up to the current jiffies and restarts the counters again.

  • submit_bio - submit a bio to the block device layer for I/O

Synopsis:

void submit_bio ( int rw )

Arguments:

  • rw - whether to READ or WRITE, or maybe to READA (read ahead)
  • bio - The struct bio which describes the I/O

Description:

submit_bio is very similar in purpose to generic_make_request, and uses that function to do most of the work. Both are fairly rough interfaces; bio must be presetup and ready for I/O.

  • blk_rq_check_limits - Helper function to check a request for the queue limit

Synopsis:

int blk_rq_check_limits ( struct request_queue * q )

Arguments:

  • q - the queue
  • rq - the request being checked

Description:

rq may have been made based on weaker limitations of upper-level queues in request stacking drivers, and it may violate the limitation of q. Since the block layer and the underlying device driver trust rq after it is inserted to q, it should be checked against q before the insertion using this generic function.

This function should also be useful for request stacking drivers in some cases below, so export this fuction. Request stacking drivers like request-based dm may change the queue limits while requests are in the queue (e.g. dm's table swapping). Such request stacking drivers should check those requests agaist the new queue limits again when they dispatch those requests, although such checkings are also done against the old queue limits when submitting requests.

  • blk_insert_cloned_request - Helper for stacking drivers to submit a request

Synopsis:

int blk_insert_cloned_request ( struct request_queue * q )

Arguments:

  • q - the queue to submit the request
  • rq - the request being queued
  • blk_rq_err_bytes - determine number of bytes till the next failure boundary

Synopsis:

unsigned int blk_rq_err_bytes ( const struct request * rq )

Arguments:

  • rq - request to examine

Description:

A request could be merge of IOs which require different failure handling. This function determines the number of bytes which can be failed from the beginning of the request without crossing into area which need to be retried further.

Return:

The number of bytes to fail.

Context:

queue_lock must be held.

  • blk_peek_request - peek at the top of a request queue

Synopsis:

struct request * blk_peek_request ( struct request_queue * q )

Arguments:

  • q - request queue to peek at

Description:

Return the request at the top of q. The returned request should be started using blk_start_request before LLD starts processing it.

Return:

Pointer to the request at the top of q if available. Null otherwise.

Context:

queue_lock must be held.

  • blk_start_request - start request processing on the driver

Synopsis:

void blk_start_request ( struct request * req )

Arguments:

  • req - request to dequeue

Description:

Dequeue req and start timeout timer on it. This hands off the request to the driver.

Block internal functions which don't want to start timer should call blk_dequeue_request.

Context:

queue_lock must be held.

  • blk_fetch_request - fetch a request from a request queue

Synopsis:

struct request * blk_fetch_request ( struct request_queue * q )

Arguments:

  • q - request queue to fetch a request from

Description:

Return the request at the top of q. The request is started on return and LLD can start processing it immediately.

Return:

Pointer to the request at the top of q if available. Null otherwise.

Context:

queue_lock must be held.

  • blk_update_request - Special helper function for request stacking drivers

Synopsis:

bool blk_update_request ( struct request * req )

Arguments:

  • req - the request being processed
  • error - 0 for success, < 0 for error
  • nr_bytes - number of bytes to complete req

Description:

Ends I/O on a number of bytes attached to req, but doesn't complete the request structure even if req doesn't have leftover. If req has leftover, sets it up for the next range of segments.

This special helper function is only for request stacking drivers (e.g. request-based dm) so that they can handle partial completion. Actual device drivers should use blk_end_request instead.

Passing the result of blk_rq_bytes as nr_bytes guarantees false return from this function.

Return:

false - this request doesn't have any more data true - this request has more data

  • blk_end_request - Helper function for drivers to complete the request.

Synopsis:

bool blk_end_request ( struct request * rq )

Arguments:

  • rq - the request being processed
  • error - 0 for success, < 0 for error
  • nr_bytes - number of bytes to complete

Description:

Ends I/O on a number of bytes attached to rq. If rq has leftover, sets it up for the next range of segments.

Return:

false - we are done with this request true - still buffers pending for this request

  • blk_end_request_all - Helper function for drives to finish the request.

Synopsis:

void blk_end_request_all ( struct request * rq )

Arguments:

  • rq - the request to finish
  • error - 0 for success, < 0 for error

Description:

Completely finish rq.

  • blk_end_request_cur - Helper function to finish the current request chunk.

Synopsis:

bool blk_end_request_cur ( struct request * rq )

Arguments:

  • rq - the request to finish the current chunk for
  • error - 0 for success, < 0 for error

Description:

Complete the current consecutively mapped chunk from rq.

Return:

false - we are done with this request true - still buffers pending for this request

  • blk_end_request_err - Finish a request till the next failure boundary.

Synopsis:

bool blk_end_request_err ( struct request * rq )

Arguments:

  • rq - the request to finish till the next failure boundary for
  • error - must be negative errno

Description:

Complete rq till the next failure boundary.

Return:

false - we are done with this request true - still buffers pending for this request

  • __blk_end_request - Helper function for drivers to complete the request.

Synopsis:

bool __blk_end_request ( struct request * rq )

Arguments:

  • rq - the request being processed
  • error - 0 for success, < 0 for error
  • nr_bytes - number of bytes to complete

Description:

Must be called with queue lock held unlike blk_end_request.

Return:

false - we are done with this request true - still buffers pending for this request

  • __blk_end_request_all - Helper function for drives to finish the request.

Synopsis:

void __blk_end_request_all ( struct request * rq )

Arguments:

  • rq - the request to finish
  • error - 0 for success, < 0 for error

Description:

Completely finish rq. Must be called with queue lock held.

  • __blk_end_request_cur - Helper function to finish the current request chunk.

Synopsis:

bool __blk_end_request_cur ( struct request * rq )

Arguments:

  • rq - the request to finish the current chunk for
  • error - 0 for success, < 0 for error

Description:

Complete the current consecutively mapped chunk from rq. Must be called with queue lock held.

Return:

false - we are done with this request true - still buffers pending for this request

  • __blk_end_request_err - Finish a request till the next failure boundary.

Synopsis:

bool __blk_end_request_err ( struct request * rq )

Arguments:

  • rq - the request to finish till the next failure boundary for
  • error - must be negative errno

Description:

Complete rq till the next failure boundary. Must be called with queue lock held.

Return:

false - we are done with this request true - still buffers pending for this request

  • blk_lld_busy - Check if underlying low-level drivers of a device are busy

Synopsis:

int blk_lld_busy ( struct request_queue * q )

Arguments:

  • q - the queue of the device being checked

Description:

Check if underlying low-level drivers of a device are busy. If the drivers want to export their busy state, they must set own exporting function using blk_queue_lld_busy first.

Basically, this function is used only by request stacking drivers to stop dispatching requests to underlying devices when underlying devices are busy. This behavior helps more I/O merging on the queue of the request stacking driver and prevents I/O throughput regression on burst I/O load.

Return:

0 - Not busy (The request stacking driver should dispatch request) 1 - Busy (The request stacking driver should stop dispatching request)

  • blk_rq_unprep_clone - Helper function to free all bios in a cloned request

Synopsis:

void blk_rq_unprep_clone ( struct request * rq )

Arguments:

  • rq - the clone request to be cleaned up

Description:

Free all bios in rq for a cloned request.

  • blk_rq_prep_clone - Helper function to setup clone request

Synopsis:

int blk_rq_prep_clone ( struct request * rq )

Arguments:

  • rq - the request to be setup
  • rq_src - original request to be cloned
  • bs - bio_set that bios for clone are allocated from
  • gfp_mask - memory allocation mask for bio
  • bio_ctr - setup function to be called for each clone bio. Returns 0 for success, non 0 for failure.
  • data - private data to be passed to bio_ctr

Description:

Clones bios in rq_src to rq, and copies attributes of rq_src to rq. The actual data parts of rq_src (e.g. →cmd, →buffer, →sense) are not copied, and copying such parts is the caller's responsibility. Also, pages which the original bios are pointing to are not copied and the cloned bios just point same pages. So cloned bios must be completed before original bios, which means the caller must complete rq before rq_src.

  • __generic_make_request - hand a buffer to its device driver for I/O

Synopsis:

void __generic_make_request ( struct bio * bio )

Arguments:

  • bio - The bio describing the location in memory and on the device.

Description:

generic_make_request is used to make I/O requests of block devices. It is passed a struct bio, which describes the I/O that needs to be done.

generic_make_request does not return any status. The success/failure status of the request, along with notification of completion, is delivered asynchronously through the bio→bi_end_io function described (one day) else where.

The caller of generic_make_request must make sure that bi_io_vec are set to describe the memory buffer, and that bi_dev and bi_sector are set to describe the device address, and the bi_end_io and optionally bi_private are set to describe how completion notification should be signaled.

generic_make_request and the drivers it calls may use bi_next if this bio happens to be merged with someone else, and may change bi_dev and bi_sector for remaps as it sees fit. So the values of these fields should NOT be depended on after the call to generic_make_request.

  • blk_end_bidi_request - Complete a bidi request

Synopsis:

bool blk_end_bidi_request ( struct request * rq )

Arguments:

  • rq - the request to complete
  • error - 0 for success, < 0 for error
  • nr_bytes - number of bytes to complete rq
  • bidi_bytes - number of bytes to complete rq→next_rq

Description:

Ends I/O on a number of bytes attached to rq and rq→next_rq. Drivers that supports bidi can safely call this member for any type of request, bidi or uni. In the later case bidi_bytes is just ignored.

Return:

false - we are done with this request true - still buffers pending for this request

  • __blk_end_bidi_request - Complete a bidi request with queue lock held

Synopsis:

bool __blk_end_bidi_request ( struct request * rq )

Arguments:

  • rq - the request to complete
  • error - 0 for success, < 0 for error
  • nr_bytes - number of bytes to complete rq
  • bidi_bytes - number of bytes to complete rq→next_rq

Description:

Identical to blk_end_bidi_request except that queue lock is assumed to be locked on entry and remains so on return.

Return:

false - we are done with this request true - still buffers pending for this request

  • blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage

Synopsis:

int blk_rq_map_user ( struct request_queue * q )

Arguments:

  • q - request queue where request should be inserted
  • rq - request structure to fill
  • map_data - pointer to the rq_map_data holding pages (if necessary)
  • ubuf - the user buffer
  • len - length of user data
  • gfp_mask - memory allocation flags

Description:

Data will be mapped directly for zero copy I/O, if possible. Otherwise a kernel bounce buffer is used.

A matching blk_rq_unmap_user must be issued at the end of I/O, while still in process context.

Note:

The mapped bio may need to be bounced through blk_queue_bounce before being submitted to the device, as pages mapped may be out of reach. It's the callers responsibility to make sure this happens. The original bio must be passed back in to blk_rq_unmap_user for proper unmapping.

  • blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage

Synopsis:

int blk_rq_map_user_iov ( struct request_queue * q )

Arguments:

  • q - request queue where request should be inserted
  • rq - request to map data to
  • map_data - pointer to the rq_map_data holding pages (if necessary)
  • iov - pointer to the iovec
  • iov_count - number of elements in the iovec
  • len - I/O byte count
  • gfp_mask - memory allocation flags

Description:

Data will be mapped directly for zero copy I/O, if possible. Otherwise a kernel bounce buffer is used.

A matching blk_rq_unmap_user must be issued at the end of I/O, while still in process context.

Note:

The mapped bio may need to be bounced through blk_queue_bounce before being submitted to the device, as pages mapped may be out of reach. It's the callers responsibility to make sure this happens. The original bio must be passed back in to blk_rq_unmap_user for proper unmapping.

  • blk_rq_unmap_user - unmap a request with user data

Synopsis:

int blk_rq_unmap_user ( struct bio * bio )

Arguments:

  • bio - start of bio list

Description:

Unmap a rq previously mapped by blk_rq_map_user. The caller must supply the original rq→bio from the blk_rq_map_user return, since the I/O completion may have changed rq→bio.

  • blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage

Synopsis:

int blk_rq_map_kern ( struct request_queue * q )

Arguments:

  • q - request queue where request should be inserted
  • rq - request to fill
  • kbuf - the kernel buffer
  • len - length of user data
  • gfp_mask - memory allocation flags

Description:

Data will be mapped directly if possible. Otherwise a bounce buffer is used. Can be called multple times to append multple buffers.

  • blk_release_queue - release a struct request_queue when it is no longer needed

Synopsis:

void blk_release_queue ( struct kobject * kobj )

Arguments:

  • kobj - the kobj belonging of the request queue to be released

Description:

blk_cleanup_queue is the pair to blk_init_queue or blk_queue_make_request. It should be called when a request queue is being released; typically when a block device is being de-registered. Currently, its primary task it to free all the struct request structures that were allocated to the queue and the queue itself.

Caveat:

Hopefully the low level driver will have finished any outstanding requests first…

  • blk_queue_prep_rq - set a prepare_request function for queue

Synopsis:

void blk_queue_prep_rq ( struct request_queue * q )

Arguments:

  • q - queue
  • pfn - prepare_request function

Description:

It's possible for a queue to register a prepare_request callback which is invoked before the request is handed to the request_fn. The goal of the function is to prepare a request for I/O, it can be used to build a cdb from the request data for instance.

  • blk_queue_merge_bvec - set a merge_bvec function for queue

Synopsis:

void blk_queue_merge_bvec ( struct request_queue * q )

Arguments:

  • q - queue
  • mbfn - merge_bvec_fn

Description:

Usually queues have static limitations on the max sectors or segments that we can put in a request. Stacking drivers may have some settings that are dynamic, and thus we have to query the queue whether it is ok to add a new bio_vec to a bio at a given offset or not. If the block device has such limitations, it needs to register a merge_bvec_fn to control the size of bio's sent to it. Note that a block device *must* allow a single page to be added to an empty bio. The block device driver may want to use the bio_split function to deal with these bio's. By default no merge_bvec_fn is defined for a queue, and only the fixed limits are honored.

  • blk_set_default_limits - reset limits to default values

Synopsis:

void blk_set_default_limits ( struct queue_limits * lim )

Arguments:

  • lim - the queue_limits structure to reset

Description:

Returns a queue_limit struct to its default state. Can be used by stacking drivers like DM that stage table swaps and reuse an existing device queue.

  • blk_queue_make_request - define an alternate make_request function for a device

Synopsis:

void blk_queue_make_request ( struct request_queue * q )

Arguments:

  • q - the request queue for the device to be affected
  • mfn - the alternate make_request function

Description:

The normal way for struct bios to be passed to a device driver is for them to be collected into requests on a request queue, and then to allow the device driver to select requests off that queue when it is ready. This works well for many block devices. However some block devices (typically virtual devices such as md or lvm) do not benefit from the processing on the request queue, and are served best by having the requests passed directly to them. This can be achieved by providing a function to blk_queue_make_request.

Caveat:

The driver that does this *must* be able to deal appropriately with buffers in highmemory. This can be accomplished by either calling __bio_kmap_atomic to get a temporary kernel mapping, or by calling blk_queue_bounce to create a buffer in normal memory.

  • blk_queue_bounce_limit - set bounce buffer limit for queue

Synopsis:

void blk_queue_bounce_limit ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • dma_mask - the maximum address the device can handle

Description:

Different hardware can have different requirements as to what pages it can do I/O directly to. A low level driver can call blk_queue_bounce_limit to have lower memory pages allocated as bounce buffers for doing I/O to pages residing above dma_mask.

  • blk_queue_max_sectors - set max sectors for a request for this queue

Synopsis:

void blk_queue_max_sectors ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • max_sectors - max sectors in the usual 512b unit

Description:

Enables a low level driver to set an upper limit on the size of received requests.

  • blk_queue_max_discard_sectors - set max sectors for a single discard

Synopsis:

void blk_queue_max_discard_sectors ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • max_discard_sectors - maximum number of sectors to discard
  • blk_queue_max_phys_segments - set max phys segments for a request for this queue

Synopsis:

void blk_queue_max_phys_segments ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • max_segments - max number of segments

Description:

Enables a low level driver to set an upper limit on the number of physical data segments in a request. This would be the largest sized scatter list the driver could handle.

  • blk_queue_max_hw_segments - set max hw segments for a request for this queue

Synopsis:

void blk_queue_max_hw_segments ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • max_segments - max number of segments

Description:

Enables a low level driver to set an upper limit on the number of hw data segments in a request. This would be the largest number of address/length pairs the host adapter can actually give at once to the device.

  • blk_queue_max_segment_size - set max segment size for blk_rq_map_sg

Synopsis:

void blk_queue_max_segment_size ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • max_size - max size of segment in bytes

Description:

Enables a low level driver to set an upper limit on the size of a coalesced segment

  • blk_queue_logical_block_size - set logical block size for the queue

Synopsis:

void blk_queue_logical_block_size ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • size - the logical block size, in bytes

Description:

This should be set to the lowest possible block size that the storage device can address. The default of 512 covers most hardware.

  • blk_queue_physical_block_size - set physical block size for the queue

Synopsis:

void blk_queue_physical_block_size ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • size - the physical block size, in bytes

Description:

This should be set to the lowest possible sector size that the hardware can operate on without reverting to read-modify-write operations.

  • blk_queue_alignment_offset - set physical block alignment offset

Synopsis:

void blk_queue_alignment_offset ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • offset - alignment offset in bytes

Description:

Some devices are naturally misaligned to compensate for things like the legacy DOS partition table 63-sector offset. Low-level drivers should call this function for devices whose first sector is not naturally aligned.

  • blk_limits_io_min - set minimum request size for a device

Synopsis:

void blk_limits_io_min ( struct queue_limits * limits )

Arguments:

  • limits - the queue limits
  • min - smallest I/O size in bytes

Description:

Some devices have an internal block size bigger than the reported hardware sector size. This function can be used to signal the smallest I/O the device can perform without incurring a performance penalty.

  • blk_queue_io_min - set minimum request size for the queue

Synopsis:

void blk_queue_io_min ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • min - smallest I/O size in bytes

Description:

Storage devices may report a granularity or preferred minimum I/O size which is the smallest request the device can perform without incurring a performance penalty. For disk drives this is often the physical block size. For RAID arrays it is often the stripe chunk size. A properly aligned multiple of minimum_io_size is the preferred request size for workloads where a high number of I/O operations is desired.

  • blk_limits_io_opt - set optimal request size for a device

Synopsis:

void blk_limits_io_opt ( struct queue_limits * limits )

Arguments:

  • limits - the queue limits
  • opt - smallest I/O size in bytes

Description:

Storage devices may report an optimal I/O size, which is the device's preferred unit for sustained I/O. This is rarely reported for disk drives. For RAID arrays it is usually the stripe width or the internal track size. A properly aligned multiple of optimal_io_size is the preferred request size for workloads where sustained throughput is desired.

  • blk_queue_io_opt - set optimal request size for the queue

Synopsis:

void blk_queue_io_opt ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • opt - optimal request size in bytes

Description:

Storage devices may report an optimal I/O size, which is the device's preferred unit for sustained I/O. This is rarely reported for disk drives. For RAID arrays it is usually the stripe width or the internal track size. A properly aligned multiple of optimal_io_size is the preferred request size for workloads where sustained throughput is desired.

  • blk_queue_stack_limits - inherit underlying queue limits for stacked drivers

Synopsis:

void blk_queue_stack_limits ( struct request_queue * t )

Arguments:

  • t - the stacking driver (top)
  • b - the underlying device (bottom)
  • blk_stack_limits - adjust queue_limits for stacked devices

Synopsis:

int blk_stack_limits ( struct queue_limits * t )

Arguments:

  • t - the stacking driver limits (top)
  • b - the underlying queue limits (bottom)
  • offset - offset to beginning of data within component device

Description:

Merges two queue_limit structs. Returns 0 if alignment didn't change. Returns -1 if adding the bottom device caused misalignment.

  • bdev_stack_limits - adjust queue limits for stacked drivers

Synopsis:

int bdev_stack_limits ( struct queue_limits * t )

Arguments:

  • t - the stacking driver limits (top device)
  • bdev - the component block_device (bottom)
  • start - first data sector within component device

Description:

Merges queue limits for a top device and a block_device. Returns 0 if alignment didn't change. Returns -1 if adding the bottom device caused misalignment.

  • disk_stack_limits - adjust queue limits for stacked drivers

Synopsis:

void disk_stack_limits ( struct gendisk * disk )

Arguments:

  • disk - MD/DM gendisk (top)
  • bdev - the underlying block device (bottom)
  • offset - offset to beginning of data within component device

Description:

Merges the limits for two queues. Returns 0 if alignment didn't change. Returns -1 if adding the bottom device caused misalignment.

  • blk_queue_dma_pad - set pad mask

Synopsis:

void blk_queue_dma_pad ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • mask - pad mask

Description:

Set dma pad mask.

Appending pad buffer to a request modifies the last entry of a scatter list such that it includes the pad buffer.

  • blk_queue_update_dma_pad - update pad mask

Synopsis:

void blk_queue_update_dma_pad ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • mask - pad mask

Description:

Update dma pad mask.

Appending pad buffer to a request modifies the last entry of a scatter list such that it includes the pad buffer.

  • blk_queue_dma_drain - Set up a drain buffer for excess dma.

Synopsis:

int blk_queue_dma_drain ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • dma_drain_needed - fn which returns non-zero if drain is necessary
  • buf - physically contiguous buffer
  • size - size of the buffer in bytes

Description:

Some devices have excess DMA problems and can't simply discard (or zero fill) the unwanted piece of the transfer. They have to have a real area of memory to transfer it into. The use case for this is ATAPI devices in DMA mode. If the packet command causes a transfer bigger than the transfer size some HBAs will lock up if there aren't DMA elements to contain the excess transfer. What this API does is adjust the queue so that the buf is always appended silently to the scatterlist.

Note:

This routine adjusts max_hw_segments to make room for appending the drain buffer. If you call blk_queue_max_hw_segments or blk_queue_max_phys_segments after calling this routine, you must set the limit to one fewer than your device can support otherwise there won't be room for the drain buffer.

  • blk_queue_segment_boundary - set boundary rules for segment merging

Synopsis:

void blk_queue_segment_boundary ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • mask - the memory boundary mask
  • blk_queue_dma_alignment - set dma length and memory alignment

Synopsis:

void blk_queue_dma_alignment ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • mask - alignment mask

description:

set required memory and length alignment for direct dma transactions. this is used when building direct io requests for the queue.

  • blk_queue_update_dma_alignment - update dma length and memory alignment

Synopsis:

void blk_queue_update_dma_alignment ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • mask - alignment mask

description:

update required memory and length alignment for direct dma transactions. If the requested alignment is larger than the current alignment, then the current queue alignment is updated to the new value, otherwise it is left alone. The design of this is to allow multiple objects (driver, device, transport etc) to set their respective alignments without having them interfere.

  • blk_execute_rq_nowait - insert a request into queue for execution

Synopsis:

void blk_execute_rq_nowait ( struct request_queue * q )

Arguments:

  • q - queue to insert the request in
  • bd_disk - matching gendisk
  • rq - request to insert
  • at_head - insert request at head or tail of queue
  • done - I/O completion handler

Description:

Insert a fully prepared request at the back of the I/O scheduler queue for execution. Don't wait for completion.

  • blk_execute_rq - insert a request into queue for execution

Synopsis:

int blk_execute_rq ( struct request_queue * q )

Arguments:

  • q - queue to insert the request in
  • bd_disk - matching gendisk
  • rq - request to insert
  • at_head - insert request at head or tail of queue

Description:

Insert a fully prepared request at the back of the I/O scheduler queue for execution and wait for completion.

  • blk_queue_ordered - does this queue support ordered writes

Synopsis:

int blk_queue_ordered ( struct request_queue * q )

Arguments:

  • q - the request queue
  • ordered - one of QUEUE_ORDERED_*
  • prepare_flush_fn - rq setup helper for cache flush ordered writes

Description:

For journalled file systems, doing ordered writes on a commit block instead of explicitly doing wait_on_buffer (which is bad for performance) can be a big win. Block drivers supporting this feature should call this function and indicate so.

  • blkdev_issue_flush - queue a flush

Synopsis:

int blkdev_issue_flush ( struct block_device * bdev )

Arguments:

  • bdev - blockdev to issue flush for
  • error_sector - error sector

Description:

Issue a flush for the block device in question. Caller can supply room for storing the error offset in case of a flush error, if they wish to.

  • blkdev_issue_discard - queue a discard

Synopsis:

int blkdev_issue_discard ( struct block_device * bdev )

Arguments:

  • bdev - blockdev to issue discard for
  • sector - start sector
  • nr_sects - number of sectors to discard
  • gfp_mask - memory allocation flags (for bio_alloc)
  • flags - DISCARD_FL_* flags to control behaviour

Description:

Issue a discard request for the sectors in question.

  • blk_queue_find_tag - find a request by its tag and queue

Synopsis:

struct request * blk_queue_find_tag ( struct request_queue * q )

Arguments:

  • q - The request queue for the device
  • tag - The tag of the request

Notes:

Should be used when a device returns a tag and you want to match it with a request.

no locks need be held.

  • blk_free_tags - release a given set of tag maintenance info

Synopsis:

void blk_free_tags ( struct blk_queue_tag * bqt )

Arguments:

  • bqt - the tag map to free

Description:

For externally managed bqt frees the map. Callers of this function must guarantee to have released all the queues that might have been using this tag map.

  • blk_queue_free_tags - release tag maintenance info

Synopsis:

void blk_queue_free_tags ( struct request_queue * q )

Arguments:

  • q - the request queue for the device

Notes:

This is used to disable tagged queuing to a device, yet leave queue in function.

  • blk_init_tags - initialize the tag info for an external tag map

Synopsis:

struct blk_queue_tag * blk_init_tags ( int depth )

Arguments:

  • depth - the maximum queue depth supported
  • blk_queue_init_tags - initialize the queue tag info

Synopsis:

int blk_queue_init_tags ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • depth - the maximum queue depth supported
  • tags - the tag to use

Description:

Queue lock must be held here if the function is called to resize an existing map.

  • blk_queue_resize_tags - change the queueing depth

Synopsis:

int blk_queue_resize_tags ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • new_depth - the new max command queueing depth

Notes:

Must be called with the queue lock held.

  • blk_queue_end_tag - end tag operations for a request

Synopsis:

void blk_queue_end_tag ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • rq - the request that has completed

Description:

Typically called when end_that_request_first returns 0, meaning all transfers have been done for a request. It's important to call this function before end_that_request_last, as that will put the request back on the free list thus corrupting the internal tag list.

Notes:

queue lock must be held.

  • blk_queue_start_tag - find a free tag and assign it

Synopsis:

int blk_queue_start_tag ( struct request_queue * q )

Arguments:

  • q - the request queue for the device
  • rq - the block request that needs tagging

Description:

This can either be used as a stand-alone helper, or possibly be assigned as the queue prep_rq_fn (in which case struct request automagically gets a tag assigned). Note that this function assumes that any type of request can be queued! if this is not true for your device, you must check the request type before calling this function. The request will also be removed from the request queue, so it's the drivers responsibility to readd it if it should need to be restarted for some reason.

Notes:

queue lock must be held.

  • blk_queue_invalidate_tags - invalidate all pending tags

Synopsis:

void blk_queue_invalidate_tags ( struct request_queue * q )

Arguments:

  • q - the request queue for the device

Description:

Hardware conditions may dictate a need to stop all pending requests. In this case, we will safely clear the block side of the tag queue and readd all requests to the request queue in the right order.

Notes:

queue lock must be held.

  • __blk_free_tags - release a given set of tag maintenance info

Synopsis:

int __blk_free_tags ( struct blk_queue_tag * bqt )

Arguments:

  • bqt - the tag map to free

Description:

Tries to free the specified bqt. Returns true if it was actually freed and false if there are still references using it

  • __blk_queue_free_tags - release tag maintenance info

Synopsis:

void __blk_queue_free_tags ( struct request_queue * q )

Arguments:

  • q - the request queue for the device

Notes:

blk_cleanup_queue will take care of calling this function, if tagging has been used. So there's no need to call this directly.

  • blk_rq_count_integrity_sg - Count number of integrity scatterlist elements

Synopsis:

int blk_rq_count_integrity_sg ( struct request * rq )

Arguments:

  • rq - request with integrity metadata attached

Description:

Returns the number of elements required in a scatterlist corresponding to the integrity metadata in a request.

  • blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist

Synopsis:

int blk_rq_map_integrity_sg ( struct request * rq )

Arguments:

  • rq - request with integrity metadata attached
  • sglist - target scatterlist

Description:

Map the integrity vectors in request into a scatterlist. The scatterlist must be big enough to hold all elements. I.e. sized using blk_rq_count_integrity_sg.

  • blk_integrity_compare - Compare integrity profile of two disks

Synopsis:

int blk_integrity_compare ( struct gendisk * gd1 )

Arguments:

  • gd1 - Disk to compare
  • gd2 - Disk to compare

Description:

Meta-devices like DM and MD need to verify that all sub-devices use the same integrity format before advertising to upper layers that they can send/receive integrity metadata. This function can be used to check whether two gendisk devices have compatible integrity formats.

  • blk_integrity_register - Register a gendisk as being integrity-capable

Synopsis:

int blk_integrity_register ( struct gendisk * disk )

Arguments:

  • disk - struct gendisk pointer to make integrity-aware
  • template - optional integrity profile to register

Description:

When a device needs to advertise itself as being able to send/receive integrity metadata it must use this function to register the capability with the block layer. The template is a blk_integrity struct with values appropriate for the underlying hardware. If template is NULL the new profile is allocated but not filled out. See Documentation/block/data-integrity.txt.

  • blk_integrity_unregister - Remove block integrity profile

Synopsis:

void blk_integrity_unregister ( struct gendisk * disk )

Arguments:

  • disk - disk whose integrity profile to deallocate

Description:

This function frees all memory used by the block integrity profile. To be called at device teardown.

  • blk_trace_ioctl - handle the ioctls associated with tracing

Synopsis:

int blk_trace_ioctl ( struct block_device * bdev )

Arguments:

  • bdev - the block device
  • cmd - the ioctl cmd
  • arg - the argument data, if any
  • blk_trace_shutdown - stop and cleanup trace structures

Synopsis:

void blk_trace_shutdown ( struct request_queue * q )

Arguments:

  • q - the request queue associated with the device
  • blk_add_trace_rq - Add a trace for a request oriented action

Synopsis:

void blk_add_trace_rq ( struct request_queue * q )

Arguments:

  • q - queue the io is for
  • rq - the source request
  • what - the action

Description:

Records an action against a request. Will log the bio offset + size.

  • blk_add_trace_bio - Add a trace for a bio oriented action

Synopsis:

void blk_add_trace_bio ( struct request_queue * q )

Arguments:

  • q - queue the io is for
  • bio - the source bio
  • what - the action

Description:

Records an action against a bio. Will log the bio offset + size.

  • blk_add_trace_remap - Add a trace for a remap operation

Synopsis:

void blk_add_trace_remap ( struct request_queue * q )

Arguments:

  • q - queue the io is for
  • bio - the source bio
  • dev - target device
  • from - source sector

Description:

Device mapper or raid target sometimes need to split a bio because it spans a stripe (or similar). Add a trace for that action.

  • blk_add_trace_rq_remap - Add a trace for a request-remap operation

Synopsis:

void blk_add_trace_rq_remap ( struct request_queue * q )

Arguments:

  • q - queue the io is for
  • rq - the source request
  • dev - target device
  • from - source sector

Description:

Device mapper remaps request to other devices. Add a trace for that action.

  • blk_mangle_minor - scatter minor numbers apart

Synopsis:

int blk_mangle_minor ( int minor )

Arguments:

  • minor - minor number to mangle

Description:

Scatter consecutively allocated minor number apart if MANGLE_DEVT is enabled. Mangling twice gives the original value.

RETURNS:

Mangled value.

CONTEXT:

Don't care.

  • blk_alloc_devt - allocate a dev_t for a partition

Synopsis:

int blk_alloc_devt ( struct hd_struct * part )

Arguments:

  • part - partition to allocate dev_t for
  • devt - out parameter for resulting dev_t

Description:

Allocate a dev_t for block device.

RETURNS:

0 on success, allocated dev_t is returned in *devt. -errno on failure.

CONTEXT:

Might sleep.

  • blk_free_devt - free a dev_t

Synopsis:

void blk_free_devt ( dev_t devt )

Arguments:

  • devt - dev_t to free

Description:

Free devt which was allocated using blk_alloc_devt.

CONTEXT:

Might sleep.

  • get_gendisk - get partitioning information for a given device

Synopsis:

struct gendisk * get_gendisk ( dev_t devt )

Arguments:

  • devt - device to get partitioning information for
  • partno - returned partition index

Description:

This function gets the structure containing partitioning information for the given device devt.

  • disk_replace_part_tbl - replace disk→part_tbl in RCU-safe way

Synopsis:

void disk_replace_part_tbl ( struct gendisk * disk )

Arguments:

  • disk - disk to replace part_tbl for
  • new_ptbl - new part_tbl to install

Description:

Replace disk→part_tbl with new_ptbl in RCU-safe way. The original ptbl is freed using RCU callback.

LOCKING:

Matching bd_mutx locked.

  • disk_expand_part_tbl - expand disk→part_tbl

Synopsis:

int disk_expand_part_tbl ( struct gendisk * disk )

Arguments:

  • disk - disk to expand part_tbl for
  • partno - expand such that this partno can fit in

Description:

Expand disk→part_tbl such that partno can fit in. disk→part_tbl uses RCU to allow unlocked dereferencing for stats and other stuff.

LOCKING:

Matching bd_mutex locked, might sleep.

RETURNS:

0 on success, -errno on failure.

  • disk_get_part - get partition

Synopsis:

struct hd_struct * disk_get_part ( struct gendisk * disk )

Arguments:

  • disk - disk to look partition from
  • partno - partition number

Description:

Look for partition partno from disk. If found, increment reference count and return it.

CONTEXT:

Don't care.

RETURNS:

Pointer to the found partition on success, NULL if not found.

  • disk_part_iter_init - initialize partition iterator

Synopsis:

void disk_part_iter_init ( struct disk_part_iter * piter )

Arguments:

  • piter - iterator to initialize
  • disk - disk to iterate over
  • flags - DISK_PITER_* flags

Description:

Initialize piter so that it iterates over partitions of disk.

CONTEXT:

Don't care.

  • disk_part_iter_next - proceed iterator to the next partition and return it

Synopsis:

struct hd_struct * disk_part_iter_next ( struct disk_part_iter * piter )

Arguments:

  • piter - iterator of interest

Description:

Proceed piter to the next partition and return it.

CONTEXT:

Don't care.

  • disk_part_iter_exit - finish up partition iteration

Synopsis:

void disk_part_iter_exit ( struct disk_part_iter * piter )

Arguments:

  • piter - iter of interest

Description:

Called when iteration is over. Cleans up piter.

CONTEXT:

Don't care.

  • disk_map_sector_rcu - map sector to partition

Synopsis:

struct hd_struct * disk_map_sector_rcu ( struct gendisk * disk )

Arguments:

  • disk - gendisk of interest
  • sector - sector to map

Description:

Find out which partition sector maps to on disk. This is primarily used for stats accounting.

CONTEXT:

RCU read locked. The returned partition pointer is valid only while preemption is disabled.

RETURNS:

Found partition on success, part0 is returned if no partition matches

  • register_blkdev - register a new block device

Synopsis:

int register_blkdev ( unsigned int major )

Arguments:

  • major - the requested major device number [1..255]. If major=0, try to allocate any unused major number.
  • name - the name of the new block device as a zero terminated string

Description:

The name must be unique within the system.

The return value depends on the major input parameter. - if a major device number was requested in range [1..255] then the function returns zero on success, or a negative error code - if any unused major number was requested with major=0 parameter then the return value is the allocated major number in range [1..255] or a negative error code otherwise

  • add_disk - add partitioning information to kernel list

Synopsis:

void add_disk ( struct gendisk * disk )

Arguments:

  • disk - per-device partitioning information

Description:

This function registers the partitioning information in disk with the kernel.

FIXME:

error handling

  • bdget_disk - do bdget by gendisk and partition number

Synopsis:

struct block_device * bdget_disk ( struct gendisk * disk )

Arguments:

  • disk - gendisk of interest
  • partno - partition number

Description:

Find partition partno from disk, do bdget on it.

CONTEXT:

Don't care.

RETURNS:

Resulting block_device on success, NULL on failure.

Char devices

  • register_chrdev_region - register a range of device numbers

Synopsis:

int register_chrdev_region ( dev_t from )

Arguments:

  • from - the first in the desired range of device numbers; must include the major number.
  • count - the number of consecutive device numbers required
  • name - the name of the device or driver.

Description:

Return value is zero on success, a negative error code on failure.

  • alloc_chrdev_region - register a range of char device numbers

Synopsis:

int alloc_chrdev_region ( dev_t * dev )

Arguments:

  • dev - output parameter for first assigned number
  • baseminor - first of the requested range of minor numbers
  • count - the number of minor numbers required
  • name - the name of the associated device or driver

Description:

Allocates a range of char device numbers. The major number will be chosen dynamically, and returned (along with the first minor number) in dev. Returns zero or a negative error code.

  • __register_chrdev - create and register a cdev occupying a range of minors

Synopsis:

int __register_chrdev ( unsigned int major )

Arguments:

  • major - major device number or 0 for dynamic allocation
  • baseminor - first of the requested range of minor numbers
  • count - the number of minor numbers required
  • name - name of this range of devices
  • fops - file operations associated with this devices

Description:

If major == 0 this functions will dynamically allocate a major and return its number.

If major > 0 this function will attempt to reserve a device with the given major number and will return zero on success.

Returns a -ve errno on failure.

The name of this device has nothing to do with the name of the device in /dev. It only helps to keep track of the different owners of devices. If your module name has only one type of devices it's ok to use e.g. the name of the module here.

  • unregister_chrdev_region - return a range of device numbers

Synopsis:

void unregister_chrdev_region ( dev_t from )

Arguments:

  • from - the first in the range of numbers to unregister
  • count - the number of device numbers to unregister

Description:

This function will unregister a range of count device numbers, starting with from. The caller should normally be the one who allocated those numbers in the first place…

  • __unregister_chrdev - unregister and destroy a cdev

Synopsis:

void __unregister_chrdev ( unsigned int major )

Arguments:

  • major - major device number
  • baseminor - first of the range of minor numbers
  • count - the number of minor numbers this cdev is occupying
  • name - name of this range of devices

Description:

Unregister and destroy the cdev occupying the region described by major, baseminor and count. This function undoes what __register_chrdev did.

  • cdev_add - add a char device to the system

Synopsis:

int cdev_add ( struct cdev * p )

Arguments:

  • p - the cdev structure for the device
  • dev - the first device number for which this device is responsible
  • count - the number of consecutive minor numbers corresponding to this device

Description:

cdev_add adds the device represented by p to the system, making it live immediately. A negative error code is returned on failure.

  • cdev_del - remove a cdev from the system

Synopsis:

void cdev_del ( struct cdev * p )

Arguments:

  • p - the cdev structure to be removed

Description:

cdev_del removes p from the system, possibly freeing the structure itself.

  • cdev_alloc - allocate a cdev structure

Synopsis:

struct cdev * cdev_alloc ( void )

Arguments:

  • void - no arguments

Description:

Allocates and returns a cdev structure, or NULL on failure.

  • cdev_init - initialize a cdev structure

Synopsis:

void cdev_init ( struct cdev * cdev )

Arguments:

  • cdev - the structure to initialize
  • fops - the file_operations for this device

Description:

Initializes cdev, remembering fops, making it ready to add to the system with cdev_add.

Miscellaneous Devices

  • misc_register - register a miscellaneous device

Synopsis:

int misc_register ( struct miscdevice * misc )

Arguments:

  • misc - device structure

Description:

Register a miscellaneous device with the kernel. If the minor number is set to MISC_DYNAMIC_MINOR a minor number is assigned and placed in the minor field of the structure. For other cases the minor number requested is used.

The structure passed is linked into the kernel and may not be destroyed until it has been unregistered.

A zero is returned on success and a negative errno code for failure.

  • misc_deregister - unregister a miscellaneous device

Synopsis:

int misc_deregister ( struct miscdevice * misc )

Arguments:

  • misc - device to unregister

Description:

Unregister a miscellaneous device that was previously successfully registered with misc_register. Success is indicated by a zero return, a negative errno code indicates an error.

Clock Framework

The clock framework defines programming interfaces to support software management of the system clock tree. This framework is widely used with System-On-Chip (SOC) platforms to support power management and various devices which may need custom clock rates. Note that these “clocks” don't relate to timekeeping or real time clocks (RTCs), each of which have separate frameworks. These struct clk instances may be used to manage for example a 96 MHz signal that is used to shift bits into and out of peripherals or busses, or otherwise trigger synchronous state machine transitions in system hardware.

Power management is supported by explicit software clock gating: unused clocks are disabled, so the system doesn't waste power changing the state of transistors that aren't in active use. On some systems this may be backed by hardware clock gating, where clocks are gated without being disabled in software. Sections of chips that are powered but not clocked may be able to retain their last state. This low power state is often called a retention mode. This mode still incurs leakage currents, especially with finer circuit geometries, but for CMOS circuits power is mostly used by clocked state changes.

Power-aware drivers only enable their clocks when the device they manage is in active use. Also, system sleep states often differ according to which clock domains are active: while a “standby” state may allow wakeup from several active domains, a “mem” (suspend-to-RAM) state may require a more wholesale shutdown of clocks derived from higher speed PLLs and oscillators, limiting the number of possible wakeup event sources. A driver's suspend method may need to be aware of system-specific clock constraints on the target sleep state.

Some platforms support programmable clock generators. These can be used by external chips of various kinds, such as other CPUs, multimedia codecs, and devices with strict requirements for interface clocking.

  • clk_get - lookup and obtain a reference to a clock producer.

Synopsis:

struct clk * clk_get ( struct device * dev )

Arguments:

  • dev - device for clock consumer
  • id - clock comsumer ID

Description:

Returns a struct clk corresponding to the clock producer, or valid IS_ERR condition containing errno. The implementation uses dev and id to determine the clock consumer, and thereby the clock producer. (IOW, id may be identical strings, but clk_get may return different clock producers depending on dev.)

Drivers must assume that the clock source is not enabled.

clk_get should not be called from within interrupt context.

  • clk_enable - inform the system when the clock source should be running.

Synopsis:

int clk_enable ( struct clk * clk )

Arguments:

  • clk - clock source

Description:

If the clock can not be enabled/disabled, this should return success.

Returns success (0) or negative errno.

  • clk_disable - inform the system when the clock source is no longer required.

Synopsis:

void clk_disable ( struct clk * clk )

Arguments:

  • clk - clock source

Description:

Inform the system that a clock source is no longer required by a driver and may be shut down.

Implementation detail:

if the clock source is shared between multiple drivers, clk_enable calls must be balanced by the same number of clk_disable calls for the clock source to be disabled.

  • clk_get_rate - obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.

Synopsis:

unsigned long clk_get_rate ( struct clk * clk )

Arguments:

  • clk - clock source
  • clk_put - “free” the clock source

Synopsis:

void clk_put ( struct clk * clk )

Arguments:

  • clk - clock source

Note:

drivers must ensure that all clk_enable calls made on this clock source are balanced by clk_disable calls prior to calling this function.

clk_put should not be called from within interrupt context.

  • clk_round_rate - adjust a rate to the exact rate a clock can provide

Synopsis:

long clk_round_rate ( struct clk * clk )

Arguments:

  • clk - clock source
  • rate - desired clock rate in Hz

Description:

Returns rounded clock rate in Hz, or negative errno.

  • clk_set_rate - set the clock rate for a clock source

Synopsis:

int clk_set_rate ( struct clk * clk )

Arguments:

  • clk - clock source
  • rate - desired clock rate in Hz

Description:

Returns success (0) or negative errno.

  • clk_set_parent - set the parent clock source for this clock

Synopsis:

int clk_set_parent ( struct clk * clk )

Arguments:

  • clk - clock source
  • parent - parent clock source

Description:

Returns success (0) or negative errno.

  • clk_get_parent - get the parent clock source for this clock

Synopsis:

struct clk * clk_get_parent ( struct clk * clk )

Arguments:

  • clk - clock source

Description:

Returns struct clk corresponding to parent clock source, or valid IS_ERR condition containing errno.

  • clk_get_sys - get a clock based upon the device name

Synopsis:

struct clk * clk_get_sys ( const char * dev_id )

Arguments:

  • dev_id - device name
  • con_id - connection ID

Description:

Returns a struct clk corresponding to the clock producer, or valid IS_ERR condition containing errno. The implementation uses dev_id and con_id to determine the clock consumer, and thereby the clock producer. In contrast to clk_get this function takes the device name instead of the device itself for identification.

Drivers must assume that the clock source is not enabled.

clk_get_sys should not be called from within interrupt context.

  • clk_add_alias - add a new clock alias

Synopsis:

int clk_add_alias ( const char * alias )

Arguments:

  • alias - name for clock alias
  • alias_dev_name - device name
  • id - platform specific clock name
  • dev - device

Description:

Allows using generic clock names for drivers by adding a new alias. Assumes clkdev, see clkdev.h for more info.

About This Book

Authors

Legal Notice

This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

For more details see the file COPYING in the source distribution of Linux.