下面的文章,对于kernel thread有深入浅出的讨论
*******************************************************************************
Kernel Thread
Gearheads Written by Sreekrishnan Venkateswaran Thursday, 15 September 2005
Threads are programming abstractions used in concurrent processing. A kernel thread is a way to implement background tasks inside the kernel. A background task can be busy handling asynchronous events or can be asleep, waiting for an event to occur. Kernel threads are similar to user processes, except that they live in kernel space and have access to kernel functions and data structures. Like user processes, kernel threads appear to monopolize the processor because of preemptive scheduling.
In this month’s “Gearheads,” let’s discuss kernel threads and develop an example that also demonstrates such as process states, wait queues, and user-mode helpers.
Built-in Kernel Threads
To see the kernel threads (also called kernel processes) running on your system, run the command ps -ef. You should see something similar to Figure One.
FIGURE ONE: A typical list of Linux kernel threads
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:36 ? 00:00:00 init [3]
root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]
root 3 1 0 22:36 ? 00:00:00 [events/0]
root 38 3 0 22:36 ? 00:00:00 [pdflush]
root 39 3 0 22:36 ? 00:00:00 [pdflush]
root 29 1 0 22:36 ? 00:00:00 [khubd]
root 695 1 0 22:36 ? 00:00:00 [kjournald]
…
root 3914 1 0 22:37 ? 00:00:00 [nfsd]
root 3915 1 0 22:37 ? 00:00:00 [nfsd]
…
root 4015 3364 0 22:55 tty3 00:00:00 -bash
root 4066 4015 0 22:59 tty3 00:00:00 ps -ef
The output of ps -ef is a list of user and kernel processes running on your system. Kernel process names are surrounded by square brackets ([]).
The [ksoftirqd/0] kernel thread is an aid to implement soft IRQs. Soft IRQs are raised by interrupt handlers to request “bottom half” processing of portions of the interrupt handler whose execution can be deferred. The idea is to minimize the code inside interrupt handlersm which results in reduced interrupt-off times in the system, thus resulting in lower latencies. ksoftirqd ensures that a high load of soft IRQs neither starves the soft IRQs nor overwhelms the system. (On Symmetric Multi-Processing (SMP) machines, where multiple thread instances can run on different processors in parallel, one instance of ksoftirqd is created per processor to improve throughput. On SMP machines, the kernel processes are named ksoftirqd/ n, where n is the processor number.)
The events/n threads (where n is the processor number) help implement work queues, which are another way of deferring work in the kernel. If a part of the kernel wants to defer execution of work, it can either create its own work queue or make use of the default events/ n worker thread.
The pdflush kernel thread flushes dirty pages from the page cache. The page cache buffers accesses to the disk. To improve performance, actual writes to the disk are delayed until the pdflush daemon writes out dirtied data to disk. This is done if the available free memory dips below a threshold or if the page has remained dirty for a sufficiently long time. In the 2.4.* kernels, these two tasks were respectively performed by separate kernel threads, bdflush and kupdated.
You may have noticed that there are two instances of pdflush in the ps output. A new instance is created if the kernel senses that existing instances are becoming intolerably busy servicing disk queues. Launching new instances of pdflush improves throughput, especially if your system has multiple disks and many of them are busy.
The khubd thread, part of the Linux USB core, monitors the machine’s USB hub and configures USB devices when they are hot-plugged into the system. kjournald is the generic kernel journaling thread, which is used by file systems like ext3. The Linux Network File System (NFS) server is implemented using a set of kernel threads named nfsd.
Creating a Kernel Thread
To illustrate kernel threads, let’s implement a simple example. Assume that you’d like the kernel to asynchronously invoke a user-mode program to send you a page or an email alert whenever it senses that the health of certain kernel data structures is unsatisfactory -- for instance, free space in network receive buffers has dipped below a low watermark.
This is a candidate for a kernel thread because:
*It’s a background task, since it has to wait for asynchronous events.
*It needs access to kernel data structures, since the actual detection of events must be done by other parts of the kernel.
*It has to invoke a user-mode helper program, which is a time consuming operation.
The kernel thread relinquishes the processor till it gets woken up by parts of the kernel that are responsible for monitoring the data structures of interest. It then invokes the user-mode helper program and passes on the appropriate identity code to the program’s environment. The user-mode program is registered with the kernel via the /proc file system.
Listing One creates the kernel thread.
Listing One: Creating a Linux kernel thread
ret = kernel_thread (mykthread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
The thread can be created in an appropriate place, for example, in init/main.c. The flags specify the resources to be shared between the parent and child threads: CLONE_FILES specifies that open files are to be shared, while CLONE_SIGHAND requests that signal handlers be shared.
Listing Two is the actual kernel thread. daemonize() creates the thread without attached user resources, while reparent_to_init() changes the parent of the calling thread to the init task.
Each Linux thread has a single parent. If a parent process dies without waiting for its child to exit, the child becomes a zombie process and wastes resources. Re-parenting the child to the init task avoids this. In the 2.6 kernel, the daemonize() function itself internally invokes reparent_to_init.
Since daemonize() blocks all signals by default, you have to call allow_signal() to enable delivery if your thread desires to handle a particular signal. There are no signal handlers inside the kernel, so use signal_pending() to check for signals and perform the appropriate action. For debugging purposes, the code in Listing Two requests delivery of SIGKILL and dies if it’s received.
Listing Two: Implementing the Kernel Thread
static DECLARE_WAIT_QUEUE_HEAD (myevent_waitqueue);
rwlock_t myevent_lock;
static int mykthread (void *unused)
{
unsigned int event_id = 0;
DECLARE_WAITQUEUE (wait, current);
/* The stuff required to become a kernel thread
* without attached user resources */
daemonize ("mykthread");
reparent_to_init (); /* In 2.4 kernels */
/* Request delivery of SIGKILL */
allow_signal (SIGKILL);
/* The thread will sleep on this wait queue till it is
* woken up by parts of the kernel in charge of sensing
* the health of data structures of interest */
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* Relinquish the processor till the event occurs */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Die if I receive SIGKILL */
if (signal_pending (current)) break;
/* Control gets here when the thread is woken up */
read_lock (&myevent_lock); /* Critical section starts */
if (myevent_id) { /* Guard against spurious wakeups */
event_id = myevent_id;
read_unlock (&myevent_lock); /* Critical section ends */
/* Invoke the registered user-mode helper and
* pass the identity code in its environment */
run_umode_handler (event_id); /* See Listing Five */
} else {
read_unlock (&myevent_lock);
}
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
return 0;
}
If you compile this as part of the kernel, you can see the newly created thread, mykthread, in the ps output, as shown in Figure Two.
FIGURE TWO: The new thread, mykthread, is a child of init
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:56 ? 00:00:00 init [3]
root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]
…
root 111 1 0 21:56 ? 00:00:00 [mykthread]
…
Before delving further into the thread implementation, let’s look at a code snippet that detects the event and awakens mykthread. Refer to Listing Three.
Listing Three: Waking up the kernel thread
/* Executed by parts of the kernel that own the
data structures whose health you want to monitor */
/* ... */
if (my_key_datastructure looks troubled) {
write_lock (&myevent_lock);
/* Fill in the identity of the data structure */
myevent_id = datastructure_id;
write_unlock (&myevent_lock);
/* Wake up mykthread */
wake_up_interruptible (&myevent_waitqueue);
}
/* ... */
The kernel accomplishes useful work using a combination of process contexts and interrupt contexts. Process contexts aren’t tied to any interrupt context and vice versa. Listing Two executes in a process context, while Listing Three can run from both process and interrupt contexts.
Process and interrupt contexts communicate via kernel data structures. In the example, myevent_id and myevent_waitqueue are used for this communication. myevent_id contains the identity of the data structure that’s in trouble. Access to myevent_id is serialized using spin locks.
(Kernel threads are preemptible only if CONFIG_PREEMPT is turned on during compile time. If CONFIG_PREEMPT is off or if you are running a 2.4 kernel without the preemption patch, your thread will freeze the system if it doesn’t go to sleep. If you comment out schedule() in Listing Two and disable CONFIG_PREEMPT in your kernel configuration, your system will lock up, too.)
Process States and Wait Queues
Let’s take a closer look at the code snippet that puts mykthread to sleep while waiting for events. The snippet is shown in Listing Four.
LISTING FOUR: How to put a thread to sleep
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* .. */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Point A */
/* .. */
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
Wait queues hold threads that need to wait for an event or a system resource. A thread in a wait queue sleeps until it’s woken by another thread or an interrupt handler that’s responsible for detecting the event. Queuing and de-queuing are done using the add_wait_queue() and remove_wait_queue() functions, while waking up queued tasks is accomplished via the wake_up_interruptible() routine.
In the above code snippet, set_current_state() is used to set the run state of the kernel thread. A kernel thread (or a normal process) can be in either of the following states: running, interruptible, uninterruptible, zombie, stopped, traced, or dead. These states are defined in include/linux/sched.h.
*A process in the running state (TASK_RUNNING) is in the scheduler run queue and is a candidate for CPU time according to the scheduling algorithm.
*A task in the interruptible state (TASK_INTERRUPTIBLE) is waiting for an event to occur and isn’t in the scheduler run queue. When the task gets woken up or if a signal is delivered to it, it re-enters the run queue.
*The uninterruptible state (TASK_UNINTERRUPTIBLE) is similar to the interruptible state except that receipt of a signal won’t put the task back into the run queue.
*A task in the zombie state (EXIT_ZOMBIE) has terminated, but its parent did not wait for the task to complete.
*A stopped task (TASK_STOPPED) has stopped execution due to receipt of certain signals.
mykthread sleeps on a wait queue (myevent_waitqueue) and changes its state to TASK_INTERRUPTIBLE, signaling that it desires to opt out of the scheduler run queue. The call to schedule() asks the scheduler to choose and run a new task from its run queue.
When another part of the kernel awakens mykthread using wake_up_interruptible() as shown in Listing Three, the thread is put back into the scheduler run queue. The process state also gets changed to TASK_RUNNING, so there’s no race condition even if the wake up occurs between the time the task state is set to TASK_INTERRUPTIBLE and the schedule() function is called. The thread also gets back into the run queue if a SIGKILL signal is delivered to it. When the scheduler subsequently picks mykthread from the run queue, execution resumes at Point A.
User-Mode Helpers
The kernel supports a mechanism for invoking user-mode programs to help perform certain functions. For example, if module auto-loading is enabled, the kernel dynamically loads necessary modules on demand using a user-mode module loader. The default loader is /sbin/modprobe, but you can change it by registering your own loader in /proc/sys/kernel/modprobe. Similarly, the kernel notifies user space about hot-plug events by invoking the program registered in /proc/sys/kernel/hotplug, which is by default /sbin/hotplug.
Listing Four contains the function used by mykthread to notify user space about detected events. The user-mode program to invoke can be registered via the sysctl interface in the /proc file system. To do this, make sure that CONFIG_SYSCTL is enabled in your kernel configuration and add an entry to the kern_table array in kernel/sysctl.c:
{KERN_MYEVENT_HANDLER, "myevent_handler",
&myevent_handler, 256,
0644, NULL, &proc_dostring,
&sysctl_string}
This creates an entry /proc/sys/kernel/myevent_handler in the /proc file system. To register your user-mode helper, do the following:
$ echo /path/to/helper > \
/proc/sys/kernel/myevent_handler
This makes /path/to/helper execute when the function in Listing Five runs.
Listing Five: Invoking User Mode Helpers
/* Called from Listing Two */
static void run_umode_handler (int event_id)
{
int i = 0;
char *argv[2], *envp[4], *buffer = NULL;
int value;
argv[i++] = myevent_handler; /* Defined earlier in kernel/sysctl.c */
/* Fill in the id corresponding to the data structure in trouble */
if (!(buffer = kmalloc (32, GFP_KERNEL))) return;
sprintf (buffer, "TROUBLED_DS=%d", event_id);
/* If no user-mode handlers are found, return */
if (!argv[0]) return;
argv[i] = 0;
/* Prepare the environment for /path/to/helper */
i = 0;
envp[i++] = "HOME=/";
envp[i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
envp[i++] = buffer;
envp[i] = 0;
/* Execute the user-mode program, /path/to/helper */
value = call_usermodehelper (argv[0], argv, envp, 0);
/* Check return values */
…
kfree (buffer);
}
The identity of the troubled kernel data structure is passed as an environment variable (TROUBLED_DS) to the user-mode helper. The helper can be a simple script like the following that sends you an email alert containing the information that it gleaned from its environment:
#!/bin/bash
echo Kernel datastructure $TROUBLED_DS \
is in trouble | mail -s Alert root
call_usermodehelper() has to be executed from a process context and runs with root capabilities. It’s implemented using a work queue in 2.6 kernels.
Looking at the Sources
In the 2.6 source tree, the ksoftirqd, pdflush, and khubd kernel threads live in kernel/softirq.c, mm/pdflush.c, and drivers/usb/core/hub.c, respectively.
The daemonize() function can be found in kernel/exit.c in the 2.6 sources and in kernel/sched.c in the 2.4 sources. For the implementation of invoking user-mode helpers, look at kernel/kmod.c.
Sreekrishnan Venkateswaran has been working for IBM India since 1996. His recent Linux projects include putting Linux onto a wristwatch, a PDA, and a pacemaker programmer. You can reach Krishnan at krishhna@gmail.com.
*************************************************************************************