ALevel-CS Chapter 11

Database

11.01 Limitations of a file-based approach

txt -> csv -> excel -> database

txt file

CSV file

excel file - open csv by excel

Database

存在的问题
  1. incorrect data - 文件无法限制具体年月日类型,也无法控制会员数量,可能写错为a
  2. BandName 用户可以随便输入,无法限制指定的BandName。所以可能输入了不存在的
  3. wrong order - 文件系统无法对会员数进行排序
  4. The data privacy issue with a single file - 安全问题,任何人都可以查看txt文件
  5. data redundancy - 数据冗余

Data integrity problems in a single file

limitation of a file-based approach

  • There is no validation technique that could detect the original error.
  • There is also possibly an error that is not evident from looking at the file contents.
  • A file-based approach can lead to data integrity problems in an individual file.

The data privacy issue with a single file

A different problem is a lack of data privacy.

The problem is that there cannot be any control of access to part of a file.

Data redundancy and possible inconsistency in multiple files

There is now data duplication across the files. This is commonly referred to as data redundancy.

Data dependency concerns

When file structures have been defined to suit specific programs, they may not be suited to supporting new applications.

11.02 The relational database

Key Terms

  • Relation - the special type of table which is used in a relational database
  • Attribute - a column in a relation that contains values
  • Tuple - a row in a relation storing data for one instance of the relation
  • Primary key - an attribute or a combination of attributes for which there is a value in each tuple and that value is unique
  • Candidate key - a key that could be chosen as the primary key
  • Secondary key - a candidate key that has not been chosen as the primary key
  • Foreign key - an attribute in one table that refers to the primary key in another table
  • Referential integrity - the use of a foreign key to ensure that a value can only be entered in one table when the same value already exists in the referenced table

The relational database

In the relational database model, each item of data is stored in a relation which is a special type of table. A relational database is a collection of relational tables.

In a database design, a table would be given a name with the attribute names listed in brackets after the table name.

Member(MemberID, MemberGivenName, MemberFamilyName, BandName, ...) Band(BandName, AgentID, ...)

A row in a relation should be referred to as a tuple but this formal name is not always used. Often a row is called a ‘record’ and the attribute values ‘fields’. The tuple is the collection of data stored for one ‘instance’ of the relation.

  1. The most important feature of the relational database concept is the primary key. A primary key may be a single attribute or a combination of attributes. Every table must have a primary key and each tuple in the table must have a value for the primary key and that value must be unique.
  2. Once a table and its attributes have been defined, the next task is to choose the primary key. In some cases there may be more than one attribute for which unique values are guaranteed. In this case, each one is a candidate key and one will be selected as the primary key.
  3. A candidate key that is not selected as the primary key is then referred to as a secondary key. Often there is no candidate key and so a primary key has to be created.
  4. A database can contain stand-alone tables, but it is more usual for each table to have some relationship to another table. This relationship is implemented by using a foreign key.
  5. This provides referential integrity which is another reason why the relational database model helps to ensure data integrity.

11.03 Entity-relationship modelling

Entity-relationship modelling

  • We can use a top-down method called stepwise refinement to break down the process of database design into simple steps
  • In database design this approach uses an entity-relationship (E–R) diagram.
  • An entity (strictly speaking an entity type) could be a thing, a type of person, an event, a transaction or an organisation.
  • there must be a number of ‘instances’ of the entity.

modelling

Example

Requirement

The agency needs a database to handle bookings for bands. Each band has a number of members. Each booking is for a venue. Each booking might be for one or more bands.

Step 1: Choose the entities

You look for the nouns. You ignore ‘agency’ because there is only the one. You choose Booking, Band, Member and Venue. For each of these there will be more than one instance. You are aware that each booking is for a gig at a venue but you ignore this because you think that the Booking entity will be sufficient to hold the required data about a gig.

Step 2: Identify the relationships

This requires experience, but the aim is not to define too many. You choose the following three::

  • Booking with Venue
  • Booking with Band
  • Band with Member.

Step 3: Decide the cardinalities of the relationships
  • one-to-one or 1:1
  • one-to-many or 1:M
  • many-to-one or M:1
  • many-to-many or M:M.

The M:1 relationship between Member and Band. One Band has more than one Member so it has many.

The M:1 relationship with more detail. A member must belong to a Band and a Band must have more than one Member.

The M:1 relationship between Booking and Venue. One Booking is for one Venue (there must be a venue and there cannot be more than one) and that one Venue can be used for many Bookings so the relationship between Booking and Venue is M::1. A Venue might exist that has so far never had a booking.

The M:M relationship between Band and Booking. One Booking can be for many Bands and that one Band has many Bookings.

The E–R diagram for the theatrical agency’s booking database

11.04 A logical entity–relationship model

A logical entity–relationship model

A link entity inserted to resolve a M::M relationship

1:M - have a foreign key referencing the primary key of the entity at the one end. M:M - Another way of looking at this problem is to argue that a foreign key is required in each entity but neither table could be created first because the other table needed to exist for the foreign key to be defined. The solution for the M::M relationship is to create a link entity.

11.05 Normalisation

Key Terms

  • Repeating group - a set of attributes that have more than one set of values when the other attributes each have a single value

范式

1、第一范式(1NF) 所谓第一范式(1NF)是指在关系模型中,对于添加的一个规范要求,所有的域都应该是原子性的,即数据库表的每一列都是不可分割的原子数据项,而不能是集合,数组,记录等非原子数据项。即实体中的某个属性有多个值时,必须拆分为不同的属性。在符合第一范式(1NF)表中的每个域值只能是实体的一个属性或一个属性的一部分。简而言之,第一范式就是无重复的域。

范式

2、第二范式(2NF) 在1NF的基础上,非码属性必须完全依赖于候选码(在1NF基础上消除非主属性对主码的部分函数依赖) 第二范式(2NF)是在第一范式(1NF)的基础上建立起来的,即满足第二范式(2NF)必须先满足第一范式(1NF)。第二范式(2NF)要求数据库表中的每个实例或记录必须可以被唯一地区分。选取一个能区分每个实体的属性或属性组,作为实体的唯一标识。例如在员工表中的身份证号码即可实现每个一员工的区分,该身份证号码即为候选键,任何一个候选键都可以被选作主键。在找不到候选键时,可额外增加属性以实现区分,如果在员工关系中,没有对其身份证号进行存储,而姓名可能会在数据库运行的某个时间重复,无法区分出实体时,设计辟如ID等不重复的编号以实现区分,被添加的编号或ID选作主键。(该主键的添加是在ER设计时添加,不是建库时随意添加)

范式

3、第三范式(3NF) 在2NF基础上,任何非主属性不依赖于其它非主属性(在2NF基础上消除传递依赖) 第三范式(3NF)是第二范式(2NF)的一个子集,即满足第三范式(3NF)必须满足第二范式(2NF)。简而言之,第三范式(3NF)要求一个关系中不包含已在其它关系已包含的非主关键字信息。例如,存在一个部门信息表,其中每个部门有部门编号(dept_id)、部门名称、部门简介等信息。那么在员工信息表中列出部门编号后就不能再将部门名称、部门简介等与部门有关的信息再加入员工信息表中。如果不存在部门信息表,则根据第三范式(3NF)也应该构建它,否则就会有大量的数据冗余。简而言之,第三范式就是属性不依赖于其它非主属性,也就是在满足2NF的基础上,任何非主属性不得传递依赖于主属性。

一范式 (原子性)

字段为 ' 家庭信息 ' 和 '学校信息 '违反了原子性,

二范式 (不存在部分依赖)

‘订单号’ 和 ‘产品号’ 是表的主键(主码),这两个字段一起决定了 ‘产品数量‘ ’产品折扣‘ ’产品价格‘ ,而订单金额和订单时间仅由订单号决定,不是由 主键(‘订单号’ 和 ‘产品号’)共同决定,所以 订单金额和订单时间是部份依赖于主键。

三范式 (不存在传递依赖)

“班主任性别”和“班主任年龄”虽然间接依赖于学号(主键),但直接依赖的是“班主任姓名”。 注:间接依赖是属性与主键一一对应,但只是间接决定,不是直接决定,所以称为间接依赖

Normalisation

Normalisation is a design technique for constructing a set of table designs from a list of data items. It can also be used to improve on existing table designs.

First normal form (1NF)

The conversion to first normal form (1NF) requires splitting the data into two groups

Second normal form (2NF)

Examine each non-key attribute and ask if it is dependent on both parts of the compound key. Any attributes that are dependent on only one of the attributes in the compound key must be moved out into a new table.

Third normal form (3NF)

Examined to see if there are any non-key dependencies; that means we must look for any non-key attribute that is dependent on another non-key attribute. If there is, a new table must be defined.

11.06 The Database Management System (DBMS)

Key Terms

  • Data management system (DBMS) - software that controls access to data in a database
  • Database administrator (DBA) - a person who uses the DBMS to customise the database to suit user and programmer requirements
  • Developer interface - gives access to software tools provided by a DBMS for creating tables
  • Query processor - software tools provided by a DBMS to allow creation and execution of a query
  • Query - used to select data from a database subject to defined conditions
  • Index - a small secondary table used for rapid searching which contains one attribute from the table being searched and pointers to the tuples in that table

The database approach

  • the external level
  • the conceptual level
  • the internal level

The physical storage of the data is represented here as being on disk. The details of the storage (the internal schema) are known only at the internal level, the lowest level in the ANSI architecture. This is controlled by the database management system (DBMS) software.

At the next level, the conceptual level, there is a single universal view of the database. This is controlled by the database administrator (DBA) who has access to the DBMS. In the ANSI architecture the conceptual level has a conceptual schema describing the organisation of the data as perceived by a user or programmer. This may also be described as a logical schema.

The facilities provided by a DBMS

  • The DBMS provides soft ware tools through a developer interface. These allow for tables to be created and attributes to be defined together with their data types.
  • The DBMS provides facilities for a programmer to develop a user interface.
  • It also provides a query processor that allows a query to be created and processed. The query is the mechanism for extracting and manipulating data from the database.
  • The other feature likely to be provided by the DBMS is the capability for creating a report to present formatted output. A programmer can incorporate access to queries and reports in the user interface.

DBMS functions likely to be used by a DBA

An important feature of the DBMS is the data dictionary which is part of the database that is hidden from view from everyone except the DBA. It contains metadata about the data. This includes details of all the definitions of tables, attributes and so on but also of how the physical storage is organised.

There are a number of features that can improve performance. Of special note is the capability to create an index for a table.

11.07 Structured Query Language (SQL)

Structured Query Language (SQL)

Data Definition Language (DDL)

CREATE DATABASE BandBooking; CREATE TABLE Band (
    BandName varchar(25),
    NumberOfMembers integer); 
ALTER TABLE Band ADD PRIMARY KEY (BandName); 
ALTER TABLE Band-Booking ADD FOREIGN KEY (BandName REFERENCES Band(BandName);
  • The SQL consists of a sequence of commands.
  • Each command is terminated by;
  • A command can occupy more than one line.
  • There is no case sensitivity.
  • There has been a decision made here to use upper case for the commands and lower case for table names, attribute names and datatypes.
  • When a command contains a list of items these are separated by a comma.
  • For the CREATE TABLE command this list is enclosed in parentheses

Data Manipulation Language (DML)

There are three categories of use for Data Manipulation Language (DML)

  • The insertion of data into the tables when the database is created
  • The modification or removal of data in the database
  • The reading of data stored in the database

Data Manipulation Language (DML)


SELECT BandName FROM Band;

SELECT BandName, NumberOfMembers FROM Band;

SELECT * FROM Band;

SELECT BandName, NumberOfMembers FROM Band ORDER BY BandName;

SELECT BandName FROM Band-Booking GROUP BY BandName;

SELECT BandName FROM Band-Booking WHERE Headlining = 'Y' GROUP BY BandName;

SELECT BandName, NumberOfMembers FROM Band WHERE NumberOfMembers > 2 ORDER BY BandName;

Data Manipulation Language (DML)

SELECT Count(*) FROM Band;

SELECT AVG(NumberOfMembers) FROM Band;

SELECT SUM(NumberOfMembers) FROM Band;

SELECT VenueName, Date 
FROM Booking 
WHERE Band-Booking.BookingID = Booking.BookingID AND Band-Booking.BandName = 'ComputerKidz';

SELECT table1.column1, table2.column2... 
FROM table1 
INNER JOIN table2 
ON table1.common _ field = table2.common _ field;

Data Manipulation Language (DML)

UPDATE Band SET NumberOfMembers = 6 WHERE BandName = 'ComputerKidz';

DELETE FROM Band-Booking WHERE BandName = 'ITWizz'; DELETE FROM Band WHERE BandName = 'ITWizz';

辅助阅读

工具下载

mysql

mysql 可以下载 Mysql Community Server和 MySQL Workbench 来运行