ALevel-CS Chapter 06 Assembly language programming

6.01 Machine code instructions

Key Terms

Machine code instructions

6.02 Assembly language

Key Terms

汇编语言

学习编程其实就是学高级语言,即那些为人类设计的计算机语言。计算机不理解高级语言,必须通过编译器转成二进制代码,才能运行。

计算机真正能够理解的是低级语言,它专门用来控制硬件。汇编语言就是低级语言,直接描述/控制 CPU 的运行。如果你想了解 CPU 到底干了些什么,以及代码的运行步骤,就一定要学习汇编语言。

机器语言与汇编语言

CPU 只负责计算,本身不具备智能。你输入一条指令(instruction),它就运行一次,然后停下来,等待下一条指令。

这些指令都是二进制的,称为操作码(opcode),比如加法指令就是00000011。编译器的作用,就是将高级语言写好的程序,翻译成一条条操作码。

对于人类来说,二进制程序是不可读的,根本看不出来机器干了什么。为了解决可读性的问题,以及偶尔的编辑需求,就诞生了汇编语言。

汇编语言是二进制指令的文本形式,与指令是一一对应的关系。比如,加法指令00000011写成汇编语言就是 ADD。只要还原成二进制,汇编语言就可以被 CPU 直接执行,所以它是最底层的低级语言。

每一种 CPU 的机器指令都是不一样的,因此对应的汇编语言也不一样。

内存模型 Heap

程序运行的时候,操作系统会给它分配一段内存,用来储存程序和运行产生的数据。这段内存有起始地址和结束地址,比如从0x1000到0x8000,起始地址是较小的那个地址,结束地址是较大的那个地址。

程序运行过程中,对于动态的内存占用请求(比如新建对象,或者使用malloc命令),系统就会从预先分配好的那段内存之中,划出一部分给用户,具体规则是从起始地址开始划分(实际上,起始地址会有一段静态数据,这里忽略)。举例来说,用户要求得到10个字节内存,那么从起始地址0x1000开始给他分配,一直分配到地址0x100A,如果再要求得到22个字节,那么就分配到0x1020。

这种因为用户主动请求而划分出来的内存区域,叫做 Heap(堆)。它由起始地址开始,从低位(地址)向高位(地址)增长。Heap 的一个重要特点就是不会自动消失,必须手动释放,或者由垃圾回收机制来回收。

内存模型 Stack

除了 Heap 以外,其他的内存占用叫做 Stack(栈)。简单说,Stack 是由于函数运行而临时占用的内存区域。

int main() {
   int a = 2;
   int b = 3;
   return add_a_and_b(a, b);
}

main函数内部调用了add_a_and_b函数。执行到这一行的时候,系统也会为add_a_and_b新建一个帧,用来储存它的内部变量。也就是说,此时同时存在两个帧:main和add_a_and_b。一般来说,调用栈有多少层,就有多少帧。

等到add_a_and_b运行结束,它的帧就会被回收,系统会回到函数main刚才中断执行的地方,继续往下执行。通过这种机制,就实现了函数的层层调用,并且每一层都能使用自己的本地变量。

所有的帧都存放在 Stack,由于帧是一层层叠加的,所以 Stack 叫做栈。生成新的帧,叫做"入栈",英文是 push;栈的回收叫做"出栈",英文是 pop。Stack 的特点就是,最晚入栈的帧最早出栈(因为最内层的函数调用,最先结束运行),这就叫做"后进先出"的数据结构。每一次函数执行结束,就自动释放一个帧,所有函数执行结束,整个 Stack 就都释放了。

寄存器

通用寄存器

Assembly language

A programmer might wish to write a program where the actions taken by the processor are directly controlled.

As well as having a uniquely defined machine code language, each processor has its own assembly language.

If a program has been written in assembly language it has to be translated into machine code before it can be executed by the processor. The translation program is called an assembler.

assembler features:

6.03 Symbolic, relative and absolute addressing

Symbolic, relative and absolute addressing

The use of symbolic addressing allows a programmer to write some assembly language code without having to bother about where the code will be stored in memory when the program is run.

For the relative addressing, the assumption is that a special-function base register BR contains the base address.

For the absolute address there are again no labels for the code. This has been coded with the understanding that the first instruction in the program is to be stored at memory address 200.

6.04 The assembly process for a two-pass assembler

points to note:

6.05 Addressing modes

Key Terms

寻址模式

assembly_addressing

Addressing modes

6.06 Assembly language instructions

Key Terms

Data movement

These types of instruction can involve loading data into a register or storing data in memory.

注意

  1. LDD 103 - the value 110 is loaded into the accumulator
  2. LDI 106 - the value 208 from address 101 is loaded into the accumulator
  3. STO 106 the value 208 is stored in address 106
  4. LDD INDEXVALUE the value 3 is loaded into the accumulator
  5. MOV IX - the value 3 from the accumulator is loaded into the index register
  6. LDX 102 - the value 206 from address 105 is loaded into the accumulator

Input and output

There are two instructions provided for input or output. In each case the instruction has only an opcode; there is no operand.

  1. The instruction with opcode IN is used to store in the ACC the ASCII value of a character typed at the keyboard.
  2. The instruction with opcode OUT is used to display on the screen the character for which the ASCII code is stored in the ACC.

注意

Comparisons and jumps

A program might need an unconditional jump or might need a jump if a condition is met. In the second case, a compare instruction is executed first.

注意

The comparison is restricted to asking if two values are equal.

The result of the comparison is recorded by a flag in the status register.

Arithmetic operations

There are no instructions for general-purpose multiplication or division. General-purpose

注意:

Example - A program to calculate the result of dividing 75 by 5

  1. The next three instructions are increasing the count by 1 and storing the new value.
  2. Instructions 106 to 108 add 5 to the sum.
  3. Instructions 109 and 110 check to see if the sum has reached 75 and if it has not the program begins the next iteration of the loop.
  4. Instructions 111 to 113 are only used when the sum has reached 75 which causes the value 15 stored for the count to be output.

Shift operations

two shift instructions:

In a logical shift no consideration is given as to what the binary code in the accumulator represents.

For a left logical shift, the most significant bit is moved to the carry bit, the remaining bits are shifted left and a zero is entered for the least significant bit.

For a right logical shift, it is the least significant bit that is moved to the carry bit and a zero is entered for the most significant bit.

If the accumulator content represents an unsigned integer, the left shift operation is a fast way to multiply by two.

For an unsigned integer the right shift represents integer division by two.

cyclic shift - a bit moves off one end into the carry bit then one step later moves in at the other end. All bit values in the original code are retained.

Left and right arithmetic shifts - provided for the multiplication or division of a signed integer by two. The sign bit is always retained following the shift.

Bitwise logic operation

注意:

6.07 Further consideration of assembly language instructions

Register transfer notation

ACC ← [[CIR(15:0)]]

Computer arithmetic

Example

The answer produced is denary −122. Two positive numbers have been added to get a negative number. This impossibility is detected by the combination of the negative flag and the overflow flag being set to 1. The processor examines the flags, identifies the problem and generates an interrupt.

Example 2

We get the answer +122. This impossibility is detected by the combination of the negative flag not being set and both the overflow and the carry flag being set to 1.

Tracing an assembly language program

Example1 Tracing an assembly language program

The tracing is based on an initial user input of 15, a second input of 27 and a final input of 31.

Example 2

Some instructions for part of a program are contained in memory locations 100 upwards. Some 4-bit binary data values are stored in locations 200 upwards. For illustrative purposes the instructions are shown in assembly language form. At the start of a part of the program, the memory contents are as shown

The entries in the table can be explained as follows.

  1. The first row shows the stored value before execution of this part of the program. There will be a value in the accumulator resulting from an earlier instruction.
  2. The second row shows the result of the execution of the instruction in location 100 which loads a value into ACC; this is followed by the PC being automatically incremented.
  3. The next two rows show the value being changed in the ACC by the instructions in 101 and 102 and the automatic incrementing of the PC each time.
  4. The fifth row has no new value in ACC because only a comparison is being done but there is an automatic increment of the PC.
  5. The sixth row shows a new value in the PC which has resulted from the execution of the jump instruction which tested for equality and found it to be True.
  6. The seventh row shows the result of the instruction in location 106 which has incremented the ACC.
  7. The final row shows the value stored in location 203.

辅助阅读

汇编语言入门教程

汇编语言的指令系统和寻址方式

汇编入门(深入学习)

w3c的汇编入门教程