在内核里面,等待队列是有很多用处的,尤其是在中断处理、进程同步、定时等场合。可以使用等待队列在实现阻塞进程的唤醒。它以队列为基础数据结构,与进程调度机制紧密结合,能够用于实现内核中的异步事件通知机制,同步对系统资源的访问等。它实现了在事件上的条件等待: 希望等待特定事件的进程把自己放进合适的等待队列,并放弃控制全。因此,等待队列表示一组睡眠的进程,当某一条件为真时,由内核唤醒它们。 等待队列由循环链表实现,其元素包括指向进程描述符的指针。正如list_head结构那样,等待队列(wait queue)作为linux内核中的基础数据结构,与进程调度紧密结合在一起;在驱动程序中,常常使用等待队列来实现进程的阻塞和进程的唤醒。因此,我们很有必要对它的内部实现进行分析。
1) 每个等待队列都有一个等待队列头(wait queue head),等待队列头是一个类型为wait_queue_head_t的数据结构:
struct __wait_queue_head { spinlock_t lock;//因为等待队列的修改是在内核中断处理程序和其它的一些函数中完成的,所以双链表必须防止并发访问。它的同步是通过lock来完成的,使用该锁实现对等待队列的互斥访问 struct list_head task_list;//双向循环链表,存放等待的进程,也就是等待队列项。 }; typedef struct __wait_queue_head wait_queue_head_t;使用等待队列时首先需要定义一个wait_queue_head,这可以通过DECLARE_WAIT_QUEUE_HEAD宏来完成,这是静态定义的方法。该宏会定义一个wait_queue_head,并且初始化结构中的锁以及等待队列。当然,动态初始化的方法也很简单,初始化一下锁及队列就可以了。
#define DECLARE_WAIT_QUEUE_HEAD(name) \ wait_queue_head_t name = __WAIT_QUEUE_HEAD_INITIALIZER(name) #define __WAIT_QUEUE_HEAD_INITIALIZER(name) { .lock = __SPIN_LOCK_UNLOCKED(name.lock), .task_list = { &(name).task_list, &(name).task_list } }
wait_queue_head_t wait_que;初始化:
init_waitqueue_head( &wait_que);
#define init_waitqueue_head(q) do { static struct lock_class_key __key; __init_waitqueue_head((q), #q, &__key); } while (0)我们可以看到这个宏使用了do-while型循环语句,里面包含两条语句。首先定义了一个变量__key,然后再调用init_waitqueue_head函数。
if(conditon) init_waitqueue_head(q); else do_somthing_else();如果我们去除do-while,那么替换后会编译错误。因为宏末尾的分号使得else语句成为一个单独的句子。你也许会说,那我这样使用:init_waitqueue_head(q)就可以避免这个错误了。这样是可以,但是对于那些初学者来说,很难避免他们要加上一个;。并且,在茫茫代码中,孤零零的出现一个没有;的语句,未免显得有些奇怪。
void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *key) { spin_lock_init(&q->lock); lockdep_set_class_and_name(&q->lock, key, name); INIT_LIST_HEAD(&q->task_list); }在这个函数中,首先利用自旋锁初始化函数初始化这个自旋锁;在上述等待队列头的定义中,我们可以看到task_list字段是一个list_head结构类型,因此我们使用INIT_LIST_HEAD对这个链表进行初始化。这些过程都是我们所熟悉的。同时,我们也可以使用初始化宏DECLARE_WAIT_QUEUE_HEAD一步的进行定义和初始化。
#define __WAIT_QUEUE_HEAD_INITIALIZER(name) { .lock = __SPIN_LOCK_UNLOCKED(name.lock), .task_list = { &(name).task_list, &(name).task_list } } #define DECLARE_WAIT_QUEUE_HEAD(name) wait_queue_head_t name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
2) 等待队列中存放的是在执行设备操作时不能获得资源而挂起的进程,也就是每个与进程相关的等待队列项
等待队列项用下面的结构体来描述:
typedef struct __wait_queue wait_queue_t; struct __wait_queue { unsigned int flags;//指明该等待的进程是互斥还是非互斥,为0时非互斥,为1时互斥;prepare_to_wait()里有对flags的操作 #define WQ_FLAG_EXCLUSIVE 0x01//从变量可以看出,此宏代表进程是互斥的;在prepare_to_wait()用于修改flags的值 void *private;//void型指针变量功能强大,你可以赋给它你需要的结构体指针。一般赋值为task_struct类型的指针,也就是说指向一个进程,通常指向当前任务控制块 wait_queue_func_t func;//函数指针,指向该等待队列项的唤醒函数,也就是唤醒阻塞任务的函数 ,决定了唤醒的方式 struct list_head task_list;// // 阻塞任务链表 };其中flags域指明该等待的进程是互斥进程还是非互斥进程。其中0是非互斥进程,WQ_FLAG_EXCLUSIVE(0x01)是互斥进程。等待队列(wait_queue_t)和等待对列头(wait_queue_head_t)的区别是等待队列是等待队列头的成员。也就是说等待队列头的task_list域链接的成员就是等待队列类型的(wait_queue_t)。
对于等待队列,我们可以使用DECLARE_WAITQUEUE宏来定义并初始化一个等待队列项。
(1)定义并初始化:
wait_queue_head_t my_queue;
init_waitqueue_head(&my_queue);
(2)直接定义并初始化。init_waitqueue_head()函数会将自旋锁初始化为未锁,等待队列初始化为空的双向循环链表。
DECLARE_WAIT_QUEUE_HEAD(my_queue);
定义并初始化,相当于(1)。
(3)定义等待队列:
DECLARE_WAITQUEUE(name,tsk);
注意此处是定义一个wait_queue_t类型的变量name,并将其private与设置为tsk。
#define __WAITQUEUE_INITIALIZER(name, tsk) { \
.private = tsk, \
.func = default_wake_function, \
.task_list = { NULL, NULL } }
#define DECLARE_WAITQUEUE(name, tsk) \
wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)
name是要定义的等待队列项的名称;tsk是task_struct类型的指针变量,它指向这个等待队列项所对应的进程。
3) 添加/移除等待队列
add_wait_queue添加函数将等待队列wait添加到以q为等待队列头的那个等待队列链表中:
void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
wait->flags &= ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(&q->lock, flags);
__add_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(add_wait_queue);
我们可以看到flags的结果必然是0,也就是说这个函数是将非互斥进程添加到等待队列当中。而且在调用具体的添加函数时候,使用关中断并保护现场的自旋锁方式使得添加操作每次只被一个进程访问。
具体的添加过程是将当前进程所对应的等待队列结点wait添加到等待队列头结点q之后。具体来说,就是将new->task_list结点添加到以head->task_list为头指针的双链表当中。另外,通过add_wait_queue_exclusive函数可以将一个互斥进程添加到等待队列当中。从添加过程可以发现,add_wait_queue函数会将非互斥进程添加到等待队列的前部。
static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new)
{
list_add(&new->task_list, &head->task_list);
}
另外, add_wait_queue_exclusive添加函数则会将互斥进程添加到等待队列的末尾。
void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
wait->flags |= WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(&q->lock, flags);
__add_wait_queue_tail(q, wait);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(add_wait_queue_exclusive);
remove_wait_queue函数用于将等待队列项wait从以q为等待队列头的等待队列中移除,源码如下
void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
spin_lock_irqsave(&q->lock, flags);
__remove_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(remove_wait_queue);
4) 在等待队列上睡眠
如何实现进程的阻塞?大致过程就是将当前进程的状态设置成睡眠状态,然后将这个进程加入到等待队列即可。在linux内核中有一组函数接口来实现这个功能。在最近的版本中这部分代码已经发现了变化。
void __sched interruptible_sleep_on(wait_queue_head_t *q)
{
sleep_on_common(q, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
EXPORT_SYMBOL(interruptible_sleep_on);
long __sched
interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
{
return sleep_on_common(q, TASK_INTERRUPTIBLE, timeout);
}
EXPORT_SYMBOL(interruptible_sleep_on_timeout);
void __sched sleep_on(wait_queue_head_t *q)
{
sleep_on_common(q, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
EXPORT_SYMBOL(sleep_on);
long __sched sleep_on_timeout(wait_queue_head_t *q, long timeout)
{
return sleep_on_common(q, TASK_UNINTERRUPTIBLE, timeout);
}
EXPORT_SYMBOL(sleep_on_timeout);
通过上述源码,你可以发现这些函数在内部都调用了sleep_on_common函数,通过传递不同的值来实现不同的功能。而这个通用函数的三个参数分别关注的是:进程要加入到那个等待队列?进程是那种睡眠状态(TASK_UNINTERRUPTIBLE还是TASK_INTERRUPTIBLE)?进程睡眠的时间?
static long __sched
sleep_on_common(wait_queue_head_t *q, int state, long timeout)
{
unsigned long flags;
wait_queue_t wait;
init_waitqueue_entry(&wait, current);
__set_current_state(state);
spin_lock_irqsave(&q->lock, flags);
__add_wait_queue(q, &wait);
spin_unlock(&q->lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&q->lock);
__remove_wait_queue(q, &wait);
spin_unlock_irqrestore(&q->lock, flags);
return timeout;
}
在此函数中,首先定义了一个等待队列项结点,通过 init_waitqueue_entry函数对其进行初始化。可以从下述初始化源码中看到,此时该等待队列项指向当前当前进程。而唤醒函数指针func则指向内核自定义的一个默认唤醒函数default_wake_function。
static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p)
{
q->flags = 0;
q->private = p;
q->func = default_wake_function;
}
int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
void *key)
{
return try_to_wake_up(curr->private, mode, wake_flags);
}
EXPORT_SYMBOL(default_wake_function);
初始化完毕后,通过__set_current_state函数将当前进程的状态设置成state。接着,在自旋锁的保护下,将当前进程对应的等待队列结点插入到等待队列链表当中。更重要的是,在schedule_timeout函数中不仅要设置进程的睡眠时间(以jiffies为单位的),还要使用schedule函数进行重新调度。一旦使用了schedule函数后,也就意味这当前这个进程真正的睡眠了,那么接下来的代码会在它唤醒后执行。当该进程被唤醒后(资源可用时),会从等待队列中将自己对应的那个等待队列结点wait移除。
上述过程都是在自旋锁保护下进行的,并且在整个执行过程中不可被其他中断所打断。现在再回过头去看一开始的那四个睡眠函数接口,你就明白了它们各自的不同之处了。
5)唤醒函数
唤醒函数会唤醒以x为头结点的等待队列中的等待队列项所对应的进程。与睡眠函数类似,内核中也有一组函数可以对阻塞的进程进行唤醒。
#define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)//从等待队列q中唤醒状态为TASK_UNINTERRUPTIBLE,TASK_INTERRUPTIBLE,TASK_KILLABLE 的所有进程。
#define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL)
#define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL)
#define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL, 1)
#define wake_up_all_locked(x) __wake_up_locked((x), TASK_NORMAL, 0)
#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)//从等待队列q中唤醒状态为TASK_INTERRUPTIBLE 的进程。
#define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)
#define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1)
通过上述代码,我们可以发现这些唤醒函数均调用了__wake_up函数。__wake_up函数的四个参数分别指:头结点指针、唤醒进程的类型、唤醒进程的数量和一个附加的void型指针变量。
/**
* __wake_up - wake up threads blocked on a waitqueue.
* @q: the waitqueue
* @mode: which threads
* @nr_exclusive: how many wake-one or wake-many threads to wake up
* @key: is directly passed to the wakeup function
*
* It may be assumed that this function implies a write memory barrier before
* changing the task state if and only if any tasks are woken up.
*/
void __wake_up(wait_queue_head_t *q, unsigned int mode,
int nr_exclusive, void *key)
{
unsigned long flags;
spin_lock_irqsave(&q->lock, flags);
__wake_up_common(q, mode, nr_exclusive, 0, key);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(__wake_up);
在__wake_up函数又通过传递不同的参数调用__wake_up_common函数来实现不同的唤醒功能
/* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it‘s an exclusive wakeup (nr_exclusive == small +ve * number) then we wake all the non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */ static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int wake_flags, void *key) { wait_queue_t *curr, *next; list_for_each_entry_safe(curr, next, &q->task_list, task_list) { unsigned flags = curr->flags; if (curr->func(curr, mode, wake_flags, key) && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; } }
struct flchip { unsigned long start; /* Offset within the map */ // unsigned long len; /* We omit len for now, because when we group them together we insist that they‘re all of the same size, and the chip size is held in the next level up. If we get more versatile later, it‘ll make it a damn sight harder to find which chip we want from a given offset, and we‘ll want to add the per-chip length field back in. */ int ref_point_counter; flstate_t state; flstate_t oldstate; unsigned int write_suspended:1; unsigned int erase_suspended:1; unsigned long in_progress_block_addr; struct mutex mutex; wait_queue_head_t wq; /* Wait on here when we‘re waiting for the chip to be ready */ int word_write_time; int buffer_write_time; int erase_time; int word_write_time_max; int buffer_write_time_max; int erase_time_max; void *priv; };
for (i=0; i< cfi->numchips; i++) { cfi->chips[i].word_write_time = 1<<cfi->cfiq->WordWriteTimeoutTyp; cfi->chips[i].buffer_write_time = 1<<cfi->cfiq->BufWriteTimeoutTyp; cfi->chips[i].erase_time = 1<<cfi->cfiq->BlockEraseTimeoutTyp; cfi->chips[i].ref_point_counter = 0; init_waitqueue_head(&(cfi->chips[i].wq));//初始化各个chip的wq }
static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, unsigned long adr, const u_char *buf, int len) { struct cfi_private *cfi = map->fldrv_priv; unsigned long timeo = jiffies + HZ; /* see comments in do_write_oneword() regarding uWriteTimeo. */ unsigned long uWriteTimeout = ( HZ / 1000 ) + 1; int ret = -EIO; unsigned long cmd_adr; int z, words; map_word datum; adr += chip->start; cmd_adr = adr; mutex_lock(&chip->mutex); ret = get_chip(map, chip, adr, FL_WRITING); if (ret) { mutex_unlock(&chip->mutex); return ret; } datum = map_word_load(map, buf); pr_debug("MTD %s(): WRITE 0x%.8lx(0x%.8lx)\n", __func__, adr, datum.x[0] ); XIP_INVAL_CACHED_RANGE(map, adr, len); ENABLE_VPP(map); xip_disable(map, chip, cmd_adr); cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); /* Write Buffer Load */ map_write(map, CMD(0x25), cmd_adr); chip->state = FL_WRITING_TO_BUFFER; /* Write length of data to come */ words = len / map_bankwidth(map); map_write(map, CMD(words - 1), cmd_adr); /* Write data */ z = 0; while(z < words * map_bankwidth(map)) { datum = map_word_load(map, buf); map_write(map, datum, adr + z); z += map_bankwidth(map); buf += map_bankwidth(map); } z -= map_bankwidth(map); adr += z; /* Write Buffer Program Confirm: GO GO GO */ map_write(map, CMD(0x29), cmd_adr); chip->state = FL_WRITING; INVALIDATE_CACHE_UDELAY(map, chip, adr, map_bankwidth(map), chip->word_write_time); timeo = jiffies + uWriteTimeout; for (;;) { if (chip->state != FL_WRITING) { /* Someone‘s suspended the write. Sleep */ DECLARE_WAITQUEUE(wait, current);//就当前进程定义等待队列 set_current_state(TASK_UNINTERRUPTIBLE);//设置当前进程的状态为TASK_UNINTERRUPTIBLE add_wait_queue(&chip->wq, &wait);//加入到等待队列中 mutex_unlock(&chip->mutex); schedule();//进入休眠 remove_wait_queue(&chip->wq, &wait); timeo = jiffies + (HZ / 2); /* FIXME */ mutex_lock(&chip->mutex); continue; } if (time_after(jiffies, timeo) && !chip_ready(map, adr)) break; if (chip_ready(map, adr)) { xip_enable(map, chip, adr); goto op_done; } /* Latency issues. Drop the lock, wait a while and retry */ UDELAY(map, chip, adr, 1); } /* * Recovery from write-buffer programming failures requires * the write-to-buffer-reset sequence. Since the last part * of the sequence also works as a normal reset, we can run * the same commands regardless of why we are here. * See e.g. * http://www.spansion.com/Support/Application%20Notes/MirrorBit_Write_Buffer_Prog_Page_Buffer_Read_AN.pdf */ cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0xF0, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); xip_enable(map, chip, adr); /* FIXME - should have reset delay before continuing */ printk(KERN_WARNING "MTD %s(): software timeout\n", __func__ ); ret = -EIO; op_done: chip->state = FL_READY; DISABLE_VPP(map); put_chip(map, chip, adr); mutex_unlock(&chip->mutex); return ret; }
static void cfi_amdstd_resume(struct mtd_info *mtd) { struct map_info *map = mtd->priv; struct cfi_private *cfi = map->fldrv_priv; int i; struct flchip *chip; for (i=0; i<cfi->numchips; i++) { chip = &cfi->chips[i]; mutex_lock(&chip->mutex); if (chip->state == FL_PM_SUSPENDED) { chip->state = FL_READY; map_write(map, CMD(0xF0), chip->start); wake_up(&chip->wq);//唤醒操作 } else printk(KERN_ERR "Argh. Chip not in PM_SUSPENDED state upon resume()\n"); mutex_unlock(&chip->mutex); } }
cfi_cmdset_0002.c中关于等待队列的使用,布布扣,bubuko.com
原文:http://blog.csdn.net/jackyard/article/details/25098181