18 KiB
第五章.实验3:进程管理
目录
5.1 实验3的基础知识
完成了实验1和实验2的读者,应该对PKE实验中的“进程”不会太陌生。因为实际上,我们从最开始的lab1_1开始就有了进程结构(struct process),只是在之前的实验中,进程结构中最重要的成员是trapframe和kstack,它们分别用来记录系统进入S模式前的进程上下文以及作为进入S模式后的操作系统栈。在实验3,我们将进入多任务环境,完成PKE实验环境下的进程创建、换入换出,以及进程调度相关实验。
5.1.1 多任务环境下进程的封装
实验3跟之前的两个实验最大的不同,在于在实验3的3个基本实验中,PKE操作系统将需要支持多个进程的执行。为了对多任务环境进行支撑,PKE操作系统定义了一个“进程池”(见kernel/process.c文件):
34 process procs[NPROC];
实际上,这个进程池就是一个包含NPROC(=32,见kernel/process.h文件)个process结构的数组。
接下来,PKE操作系统对进程的结构进行了扩充(见kernel/process.h文件):
53 // points to a page that contains mapped_regions
54 mapped_region *mapped_info;
55 // next free mapped region in mapped_info
56 int total_mapped_region;
57
58 // process id
59 uint64 pid;
60 // process status
61 int status;
62 // parent process
63 struct process *parent;
64 // next queue element
65 struct process *queue_next;
66
67 // accounting
68 int tick_count;
- 前两项mapped_info和total_mapped_region用于对进程的虚拟地址空间(中的代码段、堆栈段等)进行跟踪,这些虚拟地址空间在进程创建(fork)时,将发挥重要作用。同时,这也是lab3_1的内容。PKE将进程可能拥有的段分为以下几个类型:
29 enum segment_type {
30 CODE_SEGMENT, // ELF segment
31 DATA_SEGMENT, // ELF segment
32 STACK_SEGMENT, // runtime segment
33 CONTEXT_SEGMENT, // trapframe segment
34 SYSTEM_SEGMENT, // system segment
35 };
其中CODE_SEGMENT表示该段是从可执行ELF文件中加载的代码段,DATA_SEGMENT为从ELF文件中加载的数据段,STACK_SEGMENT为进程自身的栈段,CONTEXT_SEGMENT为保存进程上下文的trapframe所对应的段,SYSTEM_SEGMENT为进程的系统段,如所映射的异常处理段。
- pid是进程的ID号,具有唯一性;
- status记录了进程的状态,PKE操作系统在实验3给进程规定了以下几种状态:
20 enum proc_status {
21 FREE, // unused state
22 READY, // ready state
23 RUNNING, // currently running
24 BLOCKED, // waiting for something
25 ZOMBIE, // terminated but not reclaimed yet
26 };
其中,FREE为自由态,表示进程结构可用;READY为就绪态,即进程所需的资源都已准备好,可以被调度执行;RUNNING表示该进程处于正在运行的状态;BLOCKED表示进程处于阻塞状态;ZOMBIE表示进程处于“僵尸”状态,进程的资源可以被释放和回收。
- parent用于记录进程的父进程;
- queue_next用于将进程链接进各类队列(比如就绪队列);
- tick_count用于对进程进行记账,即记录它的执行经历了多少次的timer事件,将在lab3_3中实现循环轮转调度时使用。
5.1.2 进程的启动与终止
PKE实验中,创建一个进程需要先调用kernel/process.c文件中的alloc_process()函数:
88 process* alloc_process() {
89 // locate the first usable process structure
90 int i;
91
92 for( i=0; i<NPROC; i++ )
93 if( procs[i].status == FREE ) break;
94
95 if( i>=NPROC ){
96 panic( "cannot find any free process structure.\n" );
97 return 0;
98 }
99
100 // init proc[i]'s vm space
101 procs[i].trapframe = (trapframe *)alloc_page(); //trapframe, used to save context
102 memset(procs[i].trapframe, 0, sizeof(trapframe));
103
104 // page directory
105 procs[i].pagetable = (pagetable_t)alloc_page();
106 memset((void *)procs[i].pagetable, 0, PGSIZE);
107
108 procs[i].kstack = (uint64)alloc_page() + PGSIZE; //user kernel stack top
109 uint64 user_stack = (uint64)alloc_page(); //phisical address of user stack bottom
110 procs[i].trapframe->regs.sp = USER_STACK_TOP; //virtual address of user stack top
111
112 // allocates a page to record memory regions (segments)
113 procs[i].mapped_info = (mapped_region*)alloc_page();
114 memset( procs[i].mapped_info, 0, PGSIZE );
115
116 // map user stack in userspace
117 user_vm_map((pagetable_t)procs[i].pagetable, USER_STACK_TOP - PGSIZE, PGSIZE,
118 user_stack, prot_to_type(PROT_WRITE | PROT_READ, 1));
119 procs[i].mapped_info[0].va = USER_STACK_TOP - PGSIZE;
120 procs[i].mapped_info[0].npages = 1;
121 procs[i].mapped_info[0].seg_type = STACK_SEGMENT;
122
123 // map trapframe in user space (direct mapping as in kernel space).
124 user_vm_map((pagetable_t)procs[i].pagetable, (uint64)procs[i].trapframe, PGSIZE,
125 (uint64)procs[i].trapframe, prot_to_type(PROT_WRITE | PROT_READ, 0));
126 procs[i].mapped_info[1].va = (uint64)procs[i].trapframe;
127 procs[i].mapped_info[1].npages = 1;
128 procs[i].mapped_info[1].seg_type = CONTEXT_SEGMENT;
129
130 // map S-mode trap vector section in user space (direct mapping as in kernel space)
131 // we assume that the size of usertrap.S is smaller than a page.
132 user_vm_map((pagetable_t)procs[i].pagetable, (uint64)trap_sec_start, PGSIZE,
133 (uint64)trap_sec_start, prot_to_type(PROT_READ | PROT_EXEC, 0));
134 procs[i].mapped_info[2].va = (uint64)trap_sec_start;
135 procs[i].mapped_info[2].npages = 1;
136 procs[i].mapped_info[2].seg_type = SYSTEM_SEGMENT;
137
138 sprint("in alloc_proc. user frame 0x%lx, user stack 0x%lx, user kstack 0x%lx \n",
139 procs[i].trapframe, procs[i].trapframe->regs.sp, procs[i].kstack);
140
141 procs[i].total_mapped_region = 3;
142 // return after initialization.
143 return &procs[i];
144 }
通过以上代码,可以发现alloc_process()函数除了找到一个空的进程结构外,还为新创建的进程建立了KERN_BASE以上逻辑地址的映射(这段代码在实验3之前位于kernel/kernel.c文件的load_user_program()函数中),并将映射信息保存到了进程结构中。
对于给定应用,PKE将通过调用load_bincode_from_host_elf()函数载入给定应用对应的ELF文件的各个段。之后被调用的elf_load()函数在载入段后,将对被载入的段进行判断,以记录它们的虚地址映射:
62 elf_status elf_load(elf_ctx *ctx) {
63 elf_prog_header ph_addr;
64 int i, off;
65 // traverse the elf program segment headers
66 for (i = 0, off = ctx->ehdr.phoff; i < ctx->ehdr.phnum; i++, off += sizeof(ph_addr)) {
67 // read segment headers
68 if (elf_fpread(ctx, (void *)&ph_addr, sizeof(ph_addr), off) != sizeof(ph_addr)) return EL_EIO;
69
70 if (ph_addr.type != ELF_PROG_LOAD) continue;
71 if (ph_addr.memsz < ph_addr.filesz) return EL_ERR;
72 if (ph_addr.vaddr + ph_addr.memsz < ph_addr.vaddr) return EL_ERR;
73
74 // allocate memory before loading
75 void *dest = elf_alloccb(ctx, ph_addr.vaddr, ph_addr.vaddr, ph_addr.memsz);
76
77 // actual loading
78 if (elf_fpread(ctx, dest, ph_addr.memsz, ph_addr.off) != ph_addr.memsz)
79 return EL_EIO;
80
81 // record the vm region in proc->mapped_info
82 int j;
83 for( j=0; j<PGSIZE/sizeof(mapped_region); j++ )
84 if( (process*)(((elf_info*)(ctx->info))->p)->mapped_info[j].va == 0x0 ) break;
85
86 ((process*)(((elf_info*)(ctx->info))->p))->mapped_info[j].va = ph_addr.vaddr;
87 ((process*)(((elf_info*)(ctx->info))->p))->mapped_info[j].npages = 1;
88 if( ph_addr.flags == (SEGMENT_READABLE|SEGMENT_EXECUTABLE) ){
89 ((process*)(((elf_info*)(ctx->info))->p))->mapped_info[j].seg_type = CODE_SEGMENT;
90 sprint( "CODE_SEGMENT added at mapped info offset:%d\n", j );
91 }else if ( ph_addr.flags == (SEGMENT_READABLE|SEGMENT_WRITABLE) ){
92 ((process*)(((elf_info*)(ctx->info))->p))->mapped_info[j].seg_type = DATA_SEGMENT;
93 sprint( "DATA_SEGMENT added at mapped info offset:%d\n", j );
94 }else
95 panic( "unknown program segment encountered, segment flag:%d.\n", ph_addr.flags );
96
97 ((process*)(((elf_info*)(ctx->info))->p))->total_mapped_region ++;
98 }
99
100 return EL_OK;
101 }
以上代码段中,第86--97行将对被载入的段的类型(ph_addr.flags)进行判断以确定它是代码段还是数据段。完成以上的虚地址空间到物理地址空间的映射后,将形成用户进程的虚地址空间结构(如图4.5所示)。
接下来,将通过switch_to()函数将所构造的进程投入执行:
42 void switch_to(process *proc) {
43 assert(proc);
44 current = proc;
45
46 write_csr(stvec, (uint64)smode_trap_vector);
47 // set up trapframe values that smode_trap_vector will need when
48 // the process next re-enters the kernel.
49 proc->trapframe->kernel_sp = proc->kstack; // process's kernel stack
50 proc->trapframe->kernel_satp = read_csr(satp); // kernel page table
51 proc->trapframe->kernel_trap = (uint64)smode_trap_handler;
52
53 // set up the registers that strap_vector.S's sret will use
54 // to get to user space.
55
56 // set S Previous Privilege mode to User.
57 unsigned long x = read_csr(sstatus);
58 x &= ~SSTATUS_SPP; // clear SPP to 0 for user mode
59 x |= SSTATUS_SPIE; // enable interrupts in user mode
60
61 write_csr(sstatus, x);
62
63 // set S Exception Program Counter to the saved user pc.
64 write_csr(sepc, proc->trapframe->epc);
65
66 //make user page table
67 uint64 user_satp = MAKE_SATP(proc->pagetable);
68
69 // switch to user mode with sret.
70 return_to_user(proc->trapframe, user_satp);
71 }
实际上,以上函数在实验1就有所涉及,它的作用是将进程结构中的trapframe作为进程上下文恢复到RISC-V机器的通用寄存器中,并最后调用sret指令(通过return_to_user()函数)将进程投入执行。
不同于实验1和实验2,实验3的exit系统调用不能够直接将系统shutdown,因为一个进程的结束并不一定意味着系统中所有进程的完成。以下是实验3中exit系统调用的实现:
34 ssize_t sys_user_exit(uint64 code) {
35 sprint("User exit with code:%d.\n", code);
36 // in lab3 now, we should reclaim the current process, and reschedule.
37 free_process( current );
38 schedule();
39 return 0;
40 }
可以看到,如果某进程调用了exit()系统调用,操作系统的处理方法是调用free_process()函数,将当前进程(也就是调用者)进行“释放”,然后转进程调度。其中free_process()函数的实现非常简单:
149 int free_process( process* proc ) {
150 // we set the status to ZOMBIE, but cannot destruct its vm space immediately.
151 // since proc can be current process, and its user kernel stack is currently in use!
152 // but for proxy kernel, it (memory leaking) may NOT be a really serious issue,
153 // as it is different from regular OS, which needs to run 7x24.
154 proc->status = ZOMBIE;
155
156 return 0;
157 }
可以看到,free_process()函数仅是将进程设为ZOMBIE状态,而不会将进程所占用的资源全部释放!这是因为free_process()函数的调用,说明操作系统当前是在S模式下运行,而按照PKE的设计思想,S态的运行将使用当前进程的用户系统栈(user kernel stack)。此时,如果将当前进程的内存空间进行释放,将导致操作系统本身的崩溃。所以释放进程时,PKE采用的是折衷的办法,即只将其设置为僵尸(ZOMBIE)状态,而不是立即将它所占用的资源进行释放。最后,schedule()函数的调用,将选择系统中可能存在的其他处于就绪状态的进程投入运行,它的处理逻辑我们将在下一节讨论。
5.1.3 就绪进程的管理与调度
PKE的操作系统设计了一个非常简单的就绪队列管理(因为实验3的基础实验并未涉及进程的阻塞,所以未设计阻塞队列),队列头在kernel/sched.c文件中定义:
8 process* ready_queue_head = NULL;
将一个进程加入就绪队列,可以调用insert_to_ready_queue()函数:
13 void insert_to_ready_queue( process* proc ) {
14 sprint( "going to insert process %d to ready queue.\n", proc->pid );
15 // if the queue is empty in the beginning
16 if( ready_queue_head == NULL ){
17 proc->status = READY;
18 proc->queue_next = NULL;
19 ready_queue_head = proc;
20 return;
21 }
22
23 // ready queue is not empty
24 process *p;
25 // browse the ready queue to see if proc is already in-queue
26 for( p=ready_queue_head; p->queue_next!=NULL; p=p->queue_next )
27 if( p == proc ) return; //already in queue
28
29 // p points to the last element of the ready queue
30 if( p==proc ) return;
31 p->queue_next = proc;
32 proc->status = READY;
33 proc->queue_next = NULL;
34
35 return;
36 }
该函数首先(第16--21行)处理ready_queue_head为空(初始状态)的情况,如果就绪队列不为空,则将进程加入到队尾(第26--33行)。
PKE操作系统内核通过调用schedule()函数来完成进程的选择和换入:
45 void schedule() {
46 if ( !ready_queue_head ){
47 // by default, if there are no ready process, and all processes are in the status of
48 // FREE and ZOMBIE, we should shutdown the emulated RISC-V machine.
49 int should_shutdown = 1;
50
51 for( int i=0; i<NPROC; i++ )
52 if( (procs[i].status != FREE) && (procs[i].status != ZOMBIE) ){
53 should_shutdown = 0;
54 sprint( "ready queue empty, but process %d is not in free/zombie state:%d\n",
55 i, procs[i].status );
56 }
57
58 if( should_shutdown ){
59 sprint( "no more ready processes, system shutdown now.\n" );
60 shutdown( 0 );
61 }else{
62 panic( "Not handled: we should let system wait for unfinished processes.\n" );
63 }
64 }
65
66 current = ready_queue_head;
67 assert( current->status == READY );
68 ready_queue_head = ready_queue_head->queue_next;
69
70 current->status == RUNNING;
71 sprint( "going to schedule process %d to run.\n", current->pid );
72 switch_to( current );
73 }
可以看到,schedule()函数首先判断就绪队列ready_queue_head是否为空,对于为空的情况(第46--64行),schedule()函数将判断系统中所有的进程是否全部都处于被释放(FREE)状态,或者僵尸(ZOMBIE)状态。如果是,则启动关(模拟RISC-V)机程序,否则应进入等待系统中进程结束的状态。但是,由于实验3的基础实验并无可能进入这样的状态,所以我们在这里调用了panic,等后续实验有可能进入这种状态后再进一步处理。
对于就绪队列非空的情况(第66--72行),处理就简单得多:只需要将就绪队列队首的进程换入执行即可。对于换入的过程,需要注意的是,要将被选中的进程从就绪队列中摘掉。
5.2 lab3_1 进程创建(fork)
给定应用
- user/app_naive_fork.c
1 /*
2 * Below is the given application for lab3_1.
3 * It forks a child process to run .
4 * Parent process will continue to run after child exits
5 * So it is a naive "fork". we will implement a better one in later lab.
6 *
7 */
8
9 #include "user/user_lib.h"
10 #include "util/types.h"
11
12 int main(void) {
13 uint64 pid = fork();
14 if (pid == 0) {
15 printu("Child: Hello world!\n");
16 } else {
17 printu("Parent: Hello world! child id %ld\n", pid);
18 }
19
20 exit(0);
21 }
以上程序