Chapter 16
Data Representation
16.01 Data Types
Key Terms
- User-defined data type - where the programmer includes the definition in the program
- Non-composite data type - a data type defined without reference to another data type
- Enumerated data type - a non-composite user-defined data type for which the definition identifies all possible values
- Pointer variable - one for which the value is the address in memory of a different variable
- Set - a collection of data items that lacks any structure; contains no duplicates and has a number of defined operations that can be performed on it
Data Types
- Built-in data types
- User-defined data types
- Non-composite data types
- Enumerated data type
- Composite user-defined data types
- Pointer data type
- Set data type
Built-in data types
- the programming language defines the range of possible values that can be assigned to a variable when its type has been chosen.
- the programming language defines the operations that are available for manipulating values assigned to the variable.
Python
int, float, bool, str, bytes
List. Tuple, Set, Dictionary
Java
byte, short, int, long, float, double, boolean, char
User-defined data types
A user-defined data type is a data type for which the programmer has included the definition in the program.
python - class
Java - class
pseudocode - Type
Non-composite data types
A non-composite data type is one which has a definition which does not involve a reference to another data type.
Python
int, float, bool, str, bytes
Java
byte, short, int, long, float, double, boolean, char
Enumerated data type
An enumerated data type is an example of a user-defined non-composite data type.
# Declare type
TYPE
TDirections = (North, East, South, West)
TDays = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)
# Declare variable
DECLARE Direction1 : TDirections
DECLARE StartDay : TDays
Direction1 ← North
StartDay ← Wednesday
Composite user-defined data types
DECLARE
TYPE <type_name> IS RECORD
(
<column1> <datatype>,
...
...
)
A composite user-defined data type has a definition with reference to at least one other type.
- The record data type
- This allows the programmer to create record data types with components that precisely match the data requirements of the particular program.
- Python is a language that does not support the use of a record data type.
- The class
- A class is a data type which is used for an object in object-oriented programming.
Pointer data type
A pointer variable is one for which the value is a reference to a memory location.
# declare a pointer type
TYPE TIntegerPointer ← ^Integer
# declare a variable of the pointer data type
DECLARE MyIntegerPointer : TIntegerPointer
# declare two ordinary variables of type integer
DECLARE Number1, Number2 : INTEGER
# assign a value
Number1 ← 100
# assign to a pointer variable of a value which is the address of a different variable.
MyIntegerPointer ← @Number1
# assigns the value 200 to Number2
Number2 ← MyIntegerPointer^ * 2
Set data type
A set data type allows a program to create sets and to apply the mathematical operations defined in set theory.
- It contains a collection of data values.
- There is no organisation of the data values within the set.
- Duplicate values are not allowed.
- Operations that can be performed on a set include::
- checking if a value exists in a set
- adding a new data value
- removing an existing data value
- adding one set to another set.
Python
List. Tuple, Set, Dictionary
Java
Array, List, ArrayList, Map, HushMap
16.02 File organisation
Key Terms
- Binary file - a file designed for storing data to be used by a computer program
- Record - a collection of fields containing data values
File Organisation
from a computer program, there are only two defined file types::
- text file - contains data stored according to a character code of the type
- binary file - stores data in its internal representation
The organisation of a binary file is based on the concept of a record. A file contains records and each record contains fields.
data = {
'a': [1, 2.0, 3, 4 + 6],
'b': ("character string", b"byte string"),
'c': {None, True, False}
}
def write_file_text():
with open('data.txt', 'wt') as f:
for item in data.values():
f.write(str(item) + "\n")
write_file_text()
import pickle
data = {
'a': [1, 2.0, 3, 4 + 6],
'b': ("character string", b"byte string"),
'c': {None, True, False}
}
# write to file
def write_data_binary():
with open('data.pickle', 'wb') as f:
pickle.dump(data, f)
write_data_binary()
# read from file
def load_data_from_pickle():
with open('data.pickle', 'rb') as f:
file_data = pickle.load(f)
print(data)
load_data_from_pickle()
Serial files
A serial file contains records that have not been organised in any defined order.
A serial file is a database that stores values as a series of items, one after the other
Sequential files
A sequential file has records that are ordered.
A sequential file stores data in some sort of order, perhaps based on an account number
toring data in a Sequential files mean that you can run binary searches.
Direct-access files
Direct-access files are sometimes referred to as ‘random-access’ files but, as with random-access memory, the randomness is only that the access can be to any record in the file without sequential reading of the file.
- use a sequential search to look for a vacant address following the calculated one
- keep a number of overflow addresses at the end of the file
- have a linked list accessible from each address.
Random access is also known as direct access.
For simplicity this can be illustrated for 4-digit values in the key field where 1000 is used for the dividing number. The following represent three calculations:
0045/1000 gives remainder 45 for the address in the file
2005/1000 gives remainder 5 for the address in the file
3005/1000 gives remainder 5 for the address in the file
File access
Once a file organisation has been chosen and the data has been entered into a file, you need to consider how this data is to be accessed.
For a serial file, the normal usage is to read the whole file record by record.
If the data is stored in a sequential file and a particular value is needed, searching may have to be done in the same way.
Choice of file organisation
Serial file organisation is well suited to batch processing or for backing up data on magnetic tape. A direct access file is used if rapid access to an individual record in a large file is required.
A sequential file is suitable for applications when multiple records are required from one search of the file.
16.03 Real numbers
Key Terms
- Floating-point representation - a representation of real numbers that stores a value for the mantissa and a value for the exponent
real numbers
exponential notation
.253×102or2.53×101or25.3×100or253.0×10−1
Floating-point and fixed-point representations
floating-point representation
±M×RE
\pm M
- significand or mantissa
E
- exponent or exrad
R
- radix
fixed-point representation
Precision and normalisation
You have to decide about the format of a floating-point representation in two respects. You have to decide the total number of bits to be used and decide on the split between those representing the mantissa and those representing the exponent.
Conversion of representations
- Convert the whole-number part
- Add the 0 sign bit.
- Convert the fractional part
- Combine the whole number and fractional parts and enter these into the most significant of the bits allocated for the representation of the mantissa.
- Fill the remaining bits for the mantissa and the bits for the exponent with zeros.
- Adjust the position of the binary point by changing the exponent value to achieve a normalised representation.
Problems with using floating-point numbers
The only way of preventing the errors becoming a serious problem is to increase the precision of the floating-point representation by using more bits for the mantissa.
Programming languages therefore offer options to work in ‘double precision’ or ‘quadruple precision’.
The other potential problem relates to the range of numbers that can be stored.
for floating-point values there is also a possibility that if a very small number is divided by a number greater than 1 the result is a value smaller than the smallest that can be stored.