ALevel-CS Chapter 06 Assembly language programming
6.01 Machine code instructions
Key Terms
- Opcode - defines the action associated with the instruction
- Operand - defines any data needed by the instruction
- Machine code instruction - a binary code with a defined number of bits that comprises an opcode and, most often, one operand
Machine code instructions
- The only language that the CPU recognises is machine code.
- Machine code consists of a sequence of instructions.
- An instruction contains an opcode.
- An instruction may not have an operand but up to three operands are possible.
- Different processors have different instruction sets associated with them.
- Different processors will have comparable instructions for the same operations, but the coding of the instructions will be different.

- 8 bit opcode consisting of four bits for the operation
- 2 bits for the address mode
- 2 bits for addressing registers
- 16 different operations, 4 addressing modes
- 16 bits operand , the operand will be a memory address it is sensible to allocate 16 bits for it. Keeping with the 16-bit address bus.
6.02 Assembly language
Key Terms
- Assembly language - a low-level language related to machine code where opcodes are written as mnemonics and there is a character representation for an operand
- Assembler - a program used to translate an assembly language program into machine code
- Directive - an instruction to the assembler program
汇编语言

学习编程其实就是学高级语言,即那些为人类设计的计算机语言。计算机不理解高级语言,必须通过编译器转成二进制代码,才能运行。
计算机真正能够理解的是低级语言,它专门用来控制硬件。汇编语言就是低级语言,直接描述/控制 CPU 的运行。如果你想了解 CPU 到底干了些什么,以及代码的运行步骤,就一定要学习汇编语言。
机器语言与汇编语言

CPU 只负责计算,本身不具备智能。你输入一条指令(instruction),它就运行一次,然后停下来,等待下一条指令。
这些指令都是二进制的,称为操作码(opcode),比如加法指令就是00000011。编译器的作用,就是将高级语言写好的程序,翻译成一条条操作码。
对于人类来说,二进制程序是不可读的,根本看不出来机器干了什么。为了解决可读性的问题,以及偶尔的编辑需求,就诞生了汇编语言。
汇编语言是二进制指令的文本形式,与指令是一一对应的关系。比如,加法指令00000011写成汇编语言就是 ADD。只要还原成二进制,汇编语言就可以被 CPU 直接执行,所以它是最底层的低级语言。
每一种 CPU 的机器指令都是不一样的,因此对应的汇编语言也不一样。
内存模型 Heap
程序运行的时候,操作系统会给它分配一段内存,用来储存程序和运行产生的数据。这段内存有起始地址和结束地址,比如从0x1000到0x8000,起始地址是较小的那个地址,结束地址是较大的那个地址。

程序运行过程中,对于动态的内存占用请求(比如新建对象,或者使用malloc命令),系统就会从预先分配好的那段内存之中,划出一部分给用户,具体规则是从起始地址开始划分(实际上,起始地址会有一段静态数据,这里忽略)。举例来说,用户要求得到10个字节内存,那么从起始地址0x1000开始给他分配,一直分配到地址0x100A,如果再要求得到22个字节,那么就分配到0x1020。

这种因为用户主动请求而划分出来的内存区域,叫做 Heap(堆)。它由起始地址开始,从低位(地址)向高位(地址)增长。Heap 的一个重要特点就是不会自动消失,必须手动释放,或者由垃圾回收机制来回收。
内存模型 Stack

除了 Heap 以外,其他的内存占用叫做 Stack(栈)。简单说,Stack 是由于函数运行而临时占用的内存区域。

int main() {
int a = 2;
int b = 3;
return add_a_and_b(a, b);
}
main函数内部调用了add_a_and_b函数。执行到这一行的时候,系统也会为add_a_and_b新建一个帧,用来储存它的内部变量。也就是说,此时同时存在两个帧:main和add_a_and_b。一般来说,调用栈有多少层,就有多少帧。
等到add_a_and_b运行结束,它的帧就会被回收,系统会回到函数main刚才中断执行的地方,继续往下执行。通过这种机制,就实现了函数的层层调用,并且每一层都能使用自己的本地变量。
所有的帧都存放在 Stack,由于帧是一层层叠加的,所以 Stack 叫做栈。生成新的帧,叫做"入栈",英文是 push;栈的回收叫做"出栈",英文是 pop。Stack 的特点就是,最晚入栈的帧最早出栈(因为最内层的函数调用,最先结束运行),这就叫做"后进先出"的数据结构。每一次函数执行结束,就自动释放一个帧,所有函数执行结束,整个 Stack 就都释放了。
寄存器
通用寄存器
- 8086有14个寄存器
- AX、BX、CX、DX、SI、DI、SP、BP、IP、CS、SS、CS、ES、PSW。
- AX、BX、CX、DX通常用来存放一般性数据,被称为通用寄存器。
- 16位寄存器所能存储的数据最大值为216-1 。
- 为保证兼容性,8086 CPU的通用寄存器可以分为两个独立的8位寄存器使用。例: AX可分为AH和AL。
字
- 8086 CPU所有的寄存器是16位,可以存放2个字节(一个字)。
- 一字节由8 bit 组成,可以存在8位寄存器中。
- 字(word)是两字节,16位。
Assembly language
A programmer might wish to write a program where the actions taken by the processor are directly controlled.
As well as having a uniquely defined machine code language, each processor has its own assembly language.
- mnemonic - a mnemonic (a symbolic abbreviation) for the opcode
- operand - a character representation for the operand.
If a program has been written in assembly language it has to be translated into machine code before it can be executed by the processor. The translation program is called an assembler.
assembler features:
- comments
- symbolic names for constants
- labels for addresses
- macros(宏命令) - a sequence of instructions that is to be used more than once in a program
- directives - an instruction to the assembler as to how it should construct the final executable machine code
6.03 Symbolic, relative and absolute addressing
Symbolic, relative and absolute addressing

The use of symbolic addressing allows a programmer to write some assembly language code without having to bother about where the code will be stored in memory when the program is run.

For the relative addressing, the assumption is that a special-function base register BR contains the base address.
For the absolute address there are again no labels for the code. This has been coded with the understanding that the first instruction in the program is to be stored at memory address 200.
6.04 The assembly process for a two-pass assembler



points to note:
- Most of the instructions have an operand which is a 16-bit binary number.
- Usually this represents an address but for the SUB and LDM instructions the operand is used as a value.
- There is no operand for the IN and END instructions.
- The INC instruction is a special case. There is an operand in the assembly language code but this just identifies a register. In the machine code the register is identified within the opcode so no operand is needed.
- The machine code has been coded with the first instruction occupying address zero.
- This code is not executable in this form but it is valid output from the assembler.
- Changes will be needed for the addresses when the program is loaded into memory ready for it to be executed.
- Three memory locations following the program code have been allocated a value zero to ensure that they are available for use by the program when it is executed.
6.05 Addressing modes
Key Terms
- Addressing mode - when the instruction uses a value this defines how the operand must be used to find the value
寻址模式

Addressing modes

6.06 Assembly language instructions
Key Terms
- Logical shift - where bits in the accumulator are shifted to the right or to the left and a zero moves into the bit position vacated
- Cyclic shift - similar to a logical shift but bits shifted from one end reappear at the other end
- Arithmetic shift - uses the shift to carry out multiplication or division of a signed integer stored in the accumulator
Data movement
These types of instruction can involve loading data into a register or storing data in memory.

注意:
LDN #n
直接将值存入ACC
LDR #n
直接将值存入IX
LDD <address>
为直接寻址(Direct addressing), 将<address>
里面的内容作为值放入ACC
LDI <address>
为间接寻址(Indirect addressing), 将<address>
里面内容作为取值的地址,然后去内容地址里面的值,放入ACC
LDX <address>
为相对寻址(Indexed addressing),将<address>
里面的内容加上Index Register里面的值,然后取结果为地址对应的内容,放入ACC
MOV <register>
将ACC里面的值存储到指定register里面去
STO <address>
将ACC里面的值存储到制定地址

- LDD 103 - the value 110 is loaded into the accumulator
- LDI 106 - the value 208 from address 101 is loaded into the accumulator
- STO 106 the value 208 is stored in address 106
- LDD INDEXVALUE the value 3 is loaded into the accumulator
- MOV IX - the value 3 from the accumulator is loaded into the index register
- LDX 102 - the value 206 from address 105 is loaded into the accumulator
Input and output
There are two instructions provided for input or output. In each case the instruction has only an opcode; there is no operand.
- The instruction with opcode IN is used to store in the ACC the ASCII value of a character typed at the keyboard.
- The instruction with opcode OUT is used to display on the screen the character for which the ASCII code is stored in the ACC.
注意:
- IN 和 OUT 没有 操作数(operand)
- IN 用于将用户键盘输入的ASCII码存入ACC中
- OUT 用于将ACC中的值对应的ASCII符号显示到屏幕上
Comparisons and jumps
A program might need an unconditional jump or might need a jump if a condition is met. In the second case, a compare instruction is executed first.

注意:
JMP <address
直接跳转到地址所在的行,这个用的比较少
CMP <address>
直接寻址,address地址里面的内容和ACC进行对比
CMI <address>
间接寻址,address地址里面内容作为地址,取地址中的值和ACC进行对比
JPE <address>
跟着compare指令,如果为TRUE就跳转
JPN <address>
跟着compare指令,如果为False就跳转
The comparison is restricted to asking if two values are equal.
The result of the comparison is recorded by a flag in the status register.
Arithmetic operations
There are no instructions for general-purpose multiplication or division. General-purpose

注意:
ADD <address>
将address中的内容和ACC相加,存入ACC中
ADD #n
将指定的值和ACC中的值相加,存入ACC中
SUB <address>
用ACC中的值减去address中的值,存入ACC
SUB #n
用ACC中的值减去指定的值,存入ACC中
INC <register>
指定寄存器中的值加1
DEC <register>
指定寄存器中的值减1
Example - A program to calculate the result of dividing 75 by 5

- The next three instructions are increasing the count by 1 and storing the new value.
- Instructions 106 to 108 add 5 to the sum.
- Instructions 109 and 110 check to see if the sum has reached 75 and if it has not the program begins the next iteration of the loop.
- Instructions 111 to 113 are only used when the sum has reached 75 which causes the value 15 stored for the count to be output.
Shift operations
two shift instructions:
- LSL #n - where the bits in the accumulator are shifted logically n places to the left
- LSR #n - where the bits are shifted to the right.
In a logical shift no consideration is given as to what the binary code in the accumulator represents.
For a left logical shift, the most significant bit is moved to the carry bit, the remaining bits are shifted left and a zero is entered for the least significant bit.
For a right logical shift, it is the least significant bit that is moved to the carry bit and a zero is entered for the most significant bit.
If the accumulator content represents an unsigned integer, the left shift operation is a fast way to multiply by two.
For an unsigned integer the right shift represents integer division by two.
cyclic shift - a bit moves off one end into the carry bit then one step later moves in at the other end. All bit values in the original code are retained.
Left and right arithmetic shifts - provided for the multiplication or division of a signed integer by two. The sign bit is always retained following the shift.
Bitwise logic operation

注意:
AND #Bn
将ACC中的值和二进制的n,做AND操作
AND <address>
将ACC中的值和address中的值,做AND操作
XOR #Bn
将ACC中的值和二进制的n,做XOR操作
XOR <address>
将ACC中的值和address中的值,做XOR操作
OR #Bn
将ACC中的值和二进制的n,做OR操作
OR <address>
将ACC中的值和address中的值,做OR操作
6.07 Further consideration of assembly language instructions
Register transfer notation
ACC ← [[CIR(15:0)]]
Computer arithmetic
- the carry flag, identified as C, which is set to 1 if there is a carry
- the negative flag, identified as N, which is set to 1 if a result is negative
- the overflow flag, identified as V, which is set to 1 if overflow is detected.
Example

The answer produced is denary −122. Two positive numbers have been added to get a negative number. This impossibility is detected by the combination of the negative flag and the overflow flag being set to 1. The processor examines the flags, identifies the problem and generates an interrupt.
Example 2

We get the answer +122. This impossibility is detected by the combination of the negative flag not being set and both the overflow and the carry flag being set to 1.
Tracing an assembly language program
Example1 Tracing an assembly language program
The tracing is based on an initial user input of 15, a second input of 27 and a final input of 31.


Example 2
Some instructions for part of a program are contained in memory locations 100 upwards. Some 4-bit binary data values are stored in locations 200 upwards. For illustrative purposes the instructions are shown in assembly language form. At the start of a part of the program, the memory contents are as shown


The entries in the table can be explained as follows.
- The first row shows the stored value before execution of this part of the program. There will be a value in the accumulator resulting from an earlier instruction.
- The second row shows the result of the execution of the instruction in location 100 which loads a value into ACC; this is followed by the PC being automatically incremented.
- The next two rows show the value being changed in the ACC by the instructions in 101 and 102 and the automatic incrementing of the PC each time.
- The fifth row has no new value in ACC because only a comparison is being done but there is an automatic increment of the PC.
- The sixth row shows a new value in the PC which has resulted from the execution of the jump instruction which tested for equality and found it to be True.
- The seventh row shows the result of the instruction in location 106 which has incremented the ACC.
- The final row shows the value stored in location 203.
辅助阅读
汇编语言入门教程
汇编语言的指令系统和寻址方式
汇编入门(深入学习)
w3c的汇编入门教程