Chapter 16

Data Representation

16.01 Data Types

Key Terms

  • User-defined data type - where the programmer includes the definition in the program
  • Non-composite data type - a data type defined without reference to another data type
  • Enumerated data type - a non-composite user-defined data type for which the definition identifies all possible values
  • Pointer variable - one for which the value is the address in memory of a different variable
  • Set - a collection of data items that lacks any structure; contains no duplicates and has a number of defined operations that can be performed on it

Data Types

  • Built-in data types
  • User-defined data types
  • Non-composite data types
  • Enumerated data type
  • Composite user-defined data types
  • Pointer data type
  • Set data type

Built-in data types

  • the programming language defines the range of possible values that can be assigned to a variable when its type has been chosen.
  • the programming language defines the operations that are available for manipulating values assigned to the variable.
Python

int, float, bool, str, bytes

List. Tuple, Set, Dictionary

Java

byte, short, int, long, float, double, boolean, char

User-defined data types

A user-defined data type is a data type for which the programmer has included the definition in the program.

python - class

Java - class

pseudocode - Type

Non-composite data types

A non-composite data type is one which has a definition which does not involve a reference to another data type.

Python

int, float, bool, str, bytes

Java

byte, short, int, long, float, double, boolean, char

Enumerated data type

An enumerated data type is an example of a user-defined non-composite data type.

# Declare type

TYPE 
TDirections = (North, East, South, West) 
TDays = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)

# Declare variable

DECLARE Direction1 : TDirections 
DECLARE StartDay : TDays 
Direction1 ← North 
StartDay ← Wednesday

Composite user-defined data types

DECLARE
TYPE <type_name> IS RECORD
(
    <column1>  <datatype>,
    ...
    ...
)

A composite user-defined data type has a definition with reference to at least one other type.

  • The record data type
    • This allows the programmer to create record data types with components that precisely match the data requirements of the particular program.
    • Python is a language that does not support the use of a record data type.
  • The class
    • A class is a data type which is used for an object in object-oriented programming.

Pointer data type

A pointer variable is one for which the value is a reference to a memory location.

# declare a pointer type
TYPE TIntegerPointer ← ^Integer

# declare a variable of the pointer data type
DECLARE MyIntegerPointer : TIntegerPointer

# declare two ordinary variables of type integer
DECLARE Number1, Number2 : INTEGER 

# assign a value 
Number1 ← 100

# assign to a pointer variable of a value which is the address of a different variable.
MyIntegerPointer ← @Number1

# assigns the value 200 to Number2
Number2 ← MyIntegerPointer^ * 2

Set data type

A set data type allows a program to create sets and to apply the mathematical operations defined in set theory.

  • It contains a collection of data values.
  • There is no organisation of the data values within the set.
  • Duplicate values are not allowed.
  • Operations that can be performed on a set include::
    • checking if a value exists in a set
    • adding a new data value
    • removing an existing data value
    • adding one set to another set.
Python

List. Tuple, Set, Dictionary

Java

Array, List, ArrayList, Map, HushMap

16.02 File organisation

Key Terms

  • Binary file - a file designed for storing data to be used by a computer program
  • Record - a collection of fields containing data values

File Organisation

from a computer program, there are only two defined file types::

  • text file - contains data stored according to a character code of the type
  • binary file - stores data in its internal representation

The organisation of a binary file is based on the concept of a record. A file contains records and each record contains fields.

data = {
    'a': [1, 2.0, 3, 4 + 6],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}
def write_file_text():
    with open('data.txt', 'wt') as f:
        for item in data.values():
            f.write(str(item) + "\n")
write_file_text()
import pickle
data = {
    'a': [1, 2.0, 3, 4 + 6],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}
# write to file
def write_data_binary():
    with open('data.pickle', 'wb') as f:
        pickle.dump(data, f)

write_data_binary()

# read from file
def load_data_from_pickle():
    with open('data.pickle', 'rb') as f:
        file_data = pickle.load(f)
        print(data)

load_data_from_pickle()

Serial files

A serial file contains records that have not been organised in any defined order.

A serial file is a database that stores values as a series of items, one after the other

Sequential files

A sequential file has records that are ordered.

A sequential file stores data in some sort of order, perhaps based on an account number

toring data in a Sequential files mean that you can run binary searches.

Direct-access files

Direct-access files are sometimes referred to as ‘random-access’ files but, as with random-access memory, the randomness is only that the access can be to any record in the file without sequential reading of the file.

  • use a sequential search to look for a vacant address following the calculated one
  • keep a number of overflow addresses at the end of the file
  • have a linked list accessible from each address.

Random access is also known as direct access.

For simplicity this can be illustrated for 4-digit values in the key field where 1000 is used for the dividing number. The following represent three calculations:

0045/1000 gives remainder 45 for the address in the file
2005/1000 gives remainder 5 for the address in the file
3005/1000 gives remainder 5 for the address in the file

File access

Once a file organisation has been chosen and the data has been entered into a file, you need to consider how this data is to be accessed.

For a serial file, the normal usage is to read the whole file record by record.

If the data is stored in a sequential file and a particular value is needed, searching may have to be done in the same way.

Choice of file organisation

Serial file organisation is well suited to batch processing or for backing up data on magnetic tape. A direct access file is used if rapid access to an individual record in a large file is required.

A sequential file is suitable for applications when multiple records are required from one search of the file.

16.03 Real numbers

Key Terms

  • Floating-point representation - a representation of real numbers that stores a value for the mantissa and a value for the exponent

real numbers

exponential notation

.253×102or2.53×101or25.3×100or253.0×101.253 \times 10^2 or 2.53 \times 10^1 or 25.3 \times 10^0 or 253.0 \times 10^{-1}

Floating-point and fixed-point representations

floating-point representation

±M×RE\pm M \times R^E

\pm M - significand or mantissa

E - exponent or exrad

R - radix

fixed-point representation

Precision and normalisation

You have to decide about the format of a floating-point representation in two respects. You have to decide the total number of bits to be used and decide on the split between those representing the mantissa and those representing the exponent.

Conversion of representations

  1. Convert the whole-number part
  2. Add the 0 sign bit.
  3. Convert the fractional part
  4. Combine the whole number and fractional parts and enter these into the most significant of the bits allocated for the representation of the mantissa.
  5. Fill the remaining bits for the mantissa and the bits for the exponent with zeros.
  6. Adjust the position of the binary point by changing the exponent value to achieve a normalised representation.

Problems with using floating-point numbers

The only way of preventing the errors becoming a serious problem is to increase the precision of the floating-point representation by using more bits for the mantissa.

Programming languages therefore offer options to work in ‘double precision’ or ‘quadruple precision’.

The other potential problem relates to the range of numbers that can be stored.

for floating-point values there is also a possibility that if a very small number is divided by a number greater than 1 the result is a value smaller than the smallest that can be stored.