Linux进程的4GB空间,3G-4G部分大家是共享的,是内核态的地址空间,这里存放在整个内核代码和所有的内核模块,以及内核所维护的数据。Linux操作系统的体系架构分为用户态和内核态(或者用户空间和内核)。
备注:为了使上层应用能够访问到这些资源,内核必须为上层应用提供访问的接口:即系统调用。
系统调用:是操作系统的最小功能单位,我们可以把系统调用看成是一种不能再化简的操作(类似于原子操作,但是不同概念),用户态的应用程序可以通过三种方式来访问内核态的资源:系统调用、库函数、shell脚本
例子: 用户运行一程序,该程序所创建的进程开始是运行在用户态的,执行文件操作,网络数据发送等操作步骤如下:
备注:这样操作的好处在于用户态的程序就不能随意操作1内核地址空间,具有一定的安全保护作用。
The logical structure of the operating system provides two modes of operation.
User mode is the one available for the user or an application program. The alternative has a number of different names but the most often used is ‘kernel mode’.
The difference between the two is that kernel mode has sole access to part of the memory and to certain system functions that user mode cannot access.
To work properly each higher layer needs to be fully serviced by a lower layer (as in a network protocol stack).
A layered structure for an operating system is hard to achieve in practice. A more flexible approach uses a modular structure.
The operating system can ensure that I/O passes via the CPU but for large quantities of data the operating system can ensure direct transfer between memory and an I/O device.
A CPU typically operates at GHz frequencies. One second sees more than one trillion clock cycles.
无论是在批处理系统还是分时系统中,用户进程数一般都多于处理机数、这将导致它们互相争夺处理机。另外,系统进程也同样需要使用处理机。这就要求进程调度程序按一定的策略,动态地把处理机分配给处于就绪队列中的某一个进程,以使之执行。
高级调度:(High-Level Scheduling)又称为作业调度,它决定把后备作业调入内存运行;
低级调度:(Low-Level Scheduling)又称为进程调度,它决定把就绪队列的某进程获得CPU;
中级调度:(Intermediate-Level Scheduling)又称为在虚拟存储器中引入,在内、外存对换区进行进程对换。
非剥夺方式 - 分派程序一旦把处理机分配给某进程后便让它一直运行下去,直到进程完成或发生某事件而阻塞时,才把处理机分配给另一个进程。
剥夺方式 - 当一个进程正在运行时,系统可以基于某种原则,剥夺已分配给它的处理机,将之分配给其它进程。剥夺原则有:优先权原则、短进程优先原则、时间片原则。
先进先出算法 - 把处理机分配给最先进入就绪队列的进程,一个进程一旦分得处理机,便一直执行下去,直到该进程完成或阻塞时,才释放处理机。
短进程有限算法 - 最短CPU运行期优先调度算法(SCBF--Shortest CPU Burst First)。该算法从就绪队列中选出下一个“CPU执行期最短”的进程,为之分配处理机。
轮转法 -
Process can be defined as ‘a program being executed’.
A process control block (PCB) can be created in memory ready to receive data when the process is executed. Once in memory the state of the process can change.
It is possible for a process to be separated into different parts for execution. The separate parts are called threads. If this has happened, each thread is handled as an individual process.
Some interrupts are caused by errors that prematurely terminate a running process.
The interrupt mechanism is used when a process in the running state makes a system call requiring an I/O operation and has to change to the waiting state.
Whatever the reason for an interrupt, the OS kernel must invoke an interrupt-handling routine.
A scheduling algorithm can be preemptive(抢占式) or non-preemptive(非抢占式).
A preemptive algorithm can halt a process that would otherwise continue running undisturbed. If an algorithm is preemptive it may involve prioritising processes.
把进程对应的内存依旧留在物理内存中,需要的时候就切换到特定的区域。这就涉及到了内存的保护机制,毕竟进程之间可以随意读取、写入内容就乱套了,非常不安全。因此操作系统需要对物理内存做一层抽象,也就是「地址空间」(Address Space),一个进程的地址空间包含了该进程所有相关内存,比如 code / stack / heap。
一个 16 KB 的地址空间
int x;
x = x + 3;
// 汇编
128: movl 0x0(%ebx), %eax ;load 0+ebx into eax
132: addl $0x03, %eax ;add 3 to eax register
135: movl %eax, 0x0(%ebx) ;store eax back to mem
最前面的是 PC (Program Counter),用来表示当前 code 的索引,比如 CPU 执行到 128 时,进行了 Context Switch(上下文切换),那么在 Switch 回来后,还可以接着从 132 开始执行(当然需要先把 PC 存起来)。之后的就是汇编代码,告诉 CPU 该如何操作。
从进程的角度看
真实的物理内存看
基址寄存器与界限寄存器可以简单的动态重定位:每个内存地址送到内存之前,都会自动加上基址寄存器的内容。
从 32KB 处作为开始,48KB 作为结束。那 32 / 48 可不可以动态设置呢,只要在 CPU 上整两个寄存器,基址寄存器base 和 界限寄存器bounds 就可以了,base 指明从哪里开始,bounds 指定哪里是边界。 因此真实物理地址和虚拟地址之间的关系是:
physical address = virtual address + base
base and bounds 这种做法最大的问题在于空间浪费,Stack 和 Heap 中间有一块 free space,即使没有用,也被占着,那如何才能解放这块区域呢,进入虚拟内存。
抽象满足了多进程的要求,但很多情况下,现有内存无法满足仅仅一个大进程的内存要求。物理内存不够用的情况下,如何解决
如果允许将一个进程分散到许多不连续的空间,就可以避免内存紧缩,减少碎片。基于这一思想,通过引入进程的逻辑地址,把进程地址空间与实际存储空间分离,增加存储管理的灵活性。
地址空间:将源程序经过编译后得到的目标程序,存在于它所限定的地址范围内,这个范围称为地址空间。地址空间是逻辑地址的集合。
存储空间:指主存中一系列存储信息的物理单元的集合,这些单元的编号称为物理地址存储空间是物理地址的集合。
根据分配时所采用的基本单位不同,可将离散分配的管理方式分为以下三种:页式存储管理、段式存储管理和段页式存储管理。其中段页式存储管理是前两种结合的产物。
页式和段式存储管理的区别
An early approach to memory management when different processes were loaded into memory simultaneously was to partition the memory.
An improvement was dynamic partitioning where the partition size was allowed to adjust to match the process size. The rule of one process per partition remained.
An extension of this idea which allowed for larger processes to be handled was segmentation. The large process was divided into segments. Each segment was loaded into a dynamic partition in memory.
factors of limited the efficiency:
These two factors combined to cause fragmentation both of the memory and of the disk storage. This resulted in degradation in the performance of the system.
The modern approach is to use paging. The process is divided into equal-sized pages and memory is divided into frames of the same size. The secondary storage can also be divided into frames.
It could be possible to load all of the pages into memory at the same time.
A special case for the use of paging is when a program is so large that the address space needed for it is larger than the size of the memory. Paging now supports what is known as virtual memory management.
Paging requires the CPU to transfer address values to a memory management unit that allocates a corresponding address on a page. An address has to comprise two parts: the page number plus the offset from the start of the page.
The system overhead in running virtual memory can be a disadvantage. The worst problem is disk thrashing, when part of a process on one page requires another page which is on disk. When that page is loaded it almost immediately requires the original page again.
The user interface may be made available as a command line, a graphical display or a voice recognition system.
When a program involves use of a device, the operating system provides the device driver: the user just expects the device to work.
The operating system will provide a file system for a user to store data and programs.
A compiler can be described as having a ‘front end’ and a ‘back end’. The front-end program performs analysis of the source code and unless errors are found produces an intermediate code that expresses completely the semantics (the meaning) of the source code. The backend program then takes this intermediate code as input and performs synthesis of object code.
no error in the source code.
The four stages of front-end analysis:
The Sourse code that is the input data for a compiler or interpreter consists of a sequence of characters.
Each meaningful individual character or collection of characters is referred to as a lexeme.
# declaration statement
Var count : integer;
# five lexemes
Var Count : integer ;
# The assignment statement
PercentMark[Count] := Score * 10
# eight lexemes
PercentMark [ Count ] := Score * 10
For each identifier recognized there must be an entry made in the symbol table.
The symbol table contains identifier attributes such as the data type, where it is declared and where it is assigned a value.
y := a + (b * c – d) / e
The assignment statement could be converted into the following four statements, each requiring at most three addresses:
temp := b * c
temp := temp – d
temp := temp / e
y := a + temp
For each programming language, there is a defined grammar. This grammar must be understood by a programmer and also by a compiler writer.
One method of presenting the grammar rules is a syntax diagram.
Figure represents the grammar rule that an identifier must start with a letter which can be followed by any combination of none or more letters or digits.
An alternative approach is to use Backus–Naur Form (BNF)
<Identifier> ::= <Letter>|<Identifier><Letter>|<Identifier><Digit>
<Digit> ::= 0|1|2|3|4|5|6|7|8|9
<Letter> ::= <UpperCaseLetter>|<LowerCaseLetter>
<UpperCaseLetter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
<LowerCaseLetter> ::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
By contrast, BNF is a general approach which can be used to describe any assembly of data. Furthermore, it can be used as the basis for an algorithm.
Front-end analysis has established that there are syntax errors.
Backend process is the presentation of a list of these errors. For each error, there will be an explanation and the location within the program source code.
In the absence of errors, the main back-end stage is machine code generation from the intermediate code. This may involve optimization of the code.
x := (a + b) * (a – b)
y := (a + 2 * b) * (a – b)
# The most efficient code would be:
temp := (a – b)
x := (a + b) * temp
y := x + temp * b
The expression can be evaluated by first converting the infix representation in the code to Reverse Polish Notation (RPN). RPN is a postfix representation which never requires brackets and has no rules of precedence.
# Simple Expression
a + b * c
# RPN - first step is to convert b * c to get the intermediate from
a + b c *
# final RPN
a b c * +
# RPN expression
x 2 * y 3 * + 6 /
# step1 converted to give an intermediate from
(x * 2) y 3 * + 6 /
# following success versions
(x * 2)(y * 3) + 6 /
(x * 2) + (y * 3) 6 /
((x * 2) + (y * 3)) / 6
# final
(x * 2 + y * 3) / 6
# expression
a + b * c
# post-order traversal: This starts at the lowest leaf to the left of the root and then uses left–right–root ordering which ensures
a b c * +
Shunting-yard algorithm
a + b * c
# x has the value 3 and y has the value 4
x 2 * y 3 * + 6 /