ALevel-CS Chapter 1 Data Representation

1.01 Number Systems

二进制和十进制的来源

十进制的来源

人类计算使用十进制,可能是跟人类的手指有10根有关。数学来源于计数,最原始的方法就是数手指头。
亚里士多德称人类普遍使用十进制,只不过是绝大多数人生来就有10根手指这样一个解剖学事实的结果。

古罗马数字5进制

玛雅人的20进制

其他进制

二进制——最简单,但是使用起来较复杂

八进制——

十二进制——一年十二个月份,

十六进制——十六两合一斤

二十进制——玛雅人使用,手和脚一共有20根,来进行计数

二进制的来源

早期设计的机械计算装置中,使用的不是二进制,而是十进制或者其他进制,利用齿轮的不同位置表示不同的数值,这种计算装置可能更加接近人类的思想方式。比如说一个计算设备有十个齿轮,它们级连接起来,每一个齿轮有十格,小齿轮转一圈大齿轮走一格。这就是一个简单的十位十进制的数据表示设备了,可以表示0到999999999的数字。 配合其他的一些机械设备,这样一个简单的基于齿轮的装置就可以实现简单的十进制加减法了。

电子计算机出现以后,使用电子管来表示十种状态过于复杂,所以所有的电子计算机中只有两种基本的状态,开和关。

为什么要用二进制

所有信息在计算机中都是二进制表达

The word 'Hello' is stored as the binary combination of 0100100001100101011011000110110001101111

Denary, Binary, Octonary, Hexadecimal

key terms

calculate

calcu_binary_denary

Overflow: a condition when the result of a calculation is too large to fit into the number of bits defined for storage

Hexadecimal number system examples

html color

Mac Address

Media Access Control(MAC) address are 12-digit hexadecimal numbers that uniquely identify each different device in a network.

00-1B-6384-45-E6

IP address

1.02 Numbers and quantities

Key Terms

Type of numbers

prefix

decimal prefix

binary prefix

image file prefix

image with pixel 4160*3120:

R ~ 8 bits, G ~ 8 bits, B ~ 8 bits

RGB ~ 3 Byte

image file size: 4160 * 3120 * 3 Byte = 38,937,600 Byte / 1024 = 38025 KiB /1024 = 37.13 MiB

1.03 Internal coding of numbers

Key Terms

One's complement

对于小学生来说,会做5-3,但是不会做3-5。 后续我们就引入了负数的概念

3-5=3+[-5] = [-2] 中括号的数代表“反码“

计算机的数字电路只有加法器,没有减法器。既然可以用反码来做减法,所以不需要单独来设计减法器

3 = [0_0000011]
5 = [0_0000101] 符号位负数 -5 = [1_0000101] 反码 -5 = [1_11111010]

3 +[-5] =[-2]
[0_0000011]+[1_1111010]=[1_11111101]

为什么可以实现:

实际上反码可以理解为 -5 = [1_1111010] = 255 - 5

[3] + [-5] = 3 + 255 – 5 = 253 = 255 -2 = [-2]

[7] + [-5] = 7 + 255 – 5 = 255 + 2 = [2]
反码的问题

[0_00000000] ~ +0[1_11111111] ~ -0 所以存在两个0,这样在计算中是没有必要的

Two's complement

因为0 这个特殊的数字存在。0既不是正数也不是负数。
对于反码来说 正数 +0 ~ +127 负数 -127 ~ -0
所以会出现两个0
[0_0000000] +0
[1_1111111] -0
但是对于计算机来说任何数字都只能有一个编码。所以把负数整体向后移动一位,这样范围就变成 -128 ~ -1 0 ~ 127。 将这个反码+1 称为补码

补码:
正数的补码保持不变 3 = {0_0000011}
负数先求反码,然后再加1 -5 = [1_1111010] + 1 = {1_1111011}

3 +{-5} ={-2}
{0_0000011}+{1_1111011}={11111110}

Binary coded decimal

This is useful in applications that require single denary digits to be stored or transmitted.

The BCD code uses a nibble to represent a denary digit.

1.04 Internet coding text

ASCII Code

In AscII each character will take 1 byte of storage space as it is made up of 8 bits.

The 7-bit version of the code (often referred to as US ASCII) was standardised many years ago by ANSI (American National Standards Institute).

In Unicode a character takes up 2 bytes as it is made up of 16 bits.

It should be noted that Unicode codes have been developed in tandem with the Universal Character Set (UCS) scheme, standardised as ISO/IEC 10646

Note that for the two-byte, three-byte and four-byte representations all continuing bytes have the two most significant bits set to 10.

编码集推荐阅读 -> 从ASCII码->Unicode->UTF-8历史变迁,及其差异

1.05 Images

Key Terms

Vector graphics

Bitmap

Image represented by a binary value. "bit-plane". 1-bit would give us 2 colors, 2-bits would give us 4 colors, 3-bits would give us 8 colors.

RGB color

compare

Overall findings

Vector:

Bitmap:

format compare

vector:

Includes AI, CDR, CMX (Corel Metafile Exchange Image), SVG, CGM (Computer Graphics Metafile), DXF, and WMF (Windows Metafile).

Bitmap:

Includes GIF, JPG, PNG, TIFF, and PSD.

Easy of use: Vectors Are More Robust

vector:

Bitmap:

file header

A bitmap file has to store the pixel data that defines the graphic, but the file must also have a file header that contains information on how the graphic has been constructed.Because of this, the bitmap file size is larger than the size of the graphic alone. At the very least the header will define the colour depth or bit depth and the resolution.

1.06 Sound

Key Terms:

digital data and analogue data

Analogue Devices

All analogue devices use analogue data. Examples of analogue devices include:

Digital Devices

All digital devices use digital data. Examples of digital devices include:

Analogue to Digital Converter (ADC)

Digital to Analogue Converter (DAC)

IOT

sound

The sound we hear is also analogue.computers work digitally and can only process binary.

Sound is recorded at set timed intervals; this process is known as sampling.

1.07 Data compression

Key Terms

Huffman coding

Instead of having each character coded in one byte, the text is analysed to find the most often used characters. These are then given shorter codes. The original stream of bytes becomes a bit stream.

vector graphic file compression

If a vector graphic file needs to be compressed it is best converted to a Scalable Vector Graphics format. This uses a markup language description of the image which is suitable for lossless compression.

Image and sound compression

Lossy compression can be used in circumstances where a sound file or an image file can have some of the detailed coding removed or modified. This can happen when it is likely that the human ear or eye will hardly notice any difference.

For a bitmap a simple lossy compression technique is to establish a coding scheme with reduced colour depth. Then for each pixel in the original bitmap the code is changed to the one in the new scheme which represents the closest colour.

Image formats