Figure 4.9 Evolution of files and databases.
The transition from alphanumeric data to multimedia documents also
has consequences for the way in which data are stored in files and databases.
In the sixties, alphanumeric data are initially stored in sequential
files. Data are grouped in records - for instance customer data in customer
records - and are sorted in the file according to a key, for example
customer number or customer name. The sequential files correspond with
the tape storage, common in those days.
Then disks are starting to be used for storage, and records are being
made directly accessible on key. An example of this is the index-sequential
file, in which for instance customer records can be accessed on customer
number, via special indexes. Instead of having to read through all the
customers, one can now immediately find the right customer by means
of the customer number. This corresponds with the transition from batch
processing to on-line processing, in which it must be possible to get
the customer data directly on the terminal to edit them.
At the beginning of the seventies we see the advent of databases, with
hierarchical databases such as IBM's IMS. Subsequently, network databases
emerge, such as IDMS, and then relational databases, such as DB2 and
Oracle. Currently, relation databases are the common standard.
Relational databases
Data storage in relational databases is based on a data model according
to the entity/relationship principle. Entities are for instance 'customer'
and 'order'. Between entities, there are relationships. An order always
belongs to one customer, whereas there may be several orders to a customer.
In a relational databases, each entity is converted into a table with
records (called rows). The entity 'customer' becomes a table with a
row for each customer. The relationships between the entities are represented
by keys. Each order record contains the number of the customer the order
belongs to. Relational databases have the advantage that records in
a table are not only accessible through a key, but also through other
search criteria such as address or place of residence.
Databases can do more than just store and retrieve data. They also
handle matters such as:
- authorisation: which users are allowed to retrieve and edit certain
data;
- securing data through regular back-up procedures keeping a log of
the updates;
- restoring the contents of the database after breakdowns.
Because of these extra facilities we use the name Database Management
System (DBMS).
Storage of multimedia documents
With the extensive use of personal computers and multimedia, nowadays,
besides structured, alphanumeric data we now also have to store unstructured
data such as text, pictures, images and sound. At the present time,
most users store this type of data as separate documents, each document
being a single file. The user therefore has separate files for each
text and each picture. It is also possible to include pictures in a
text, but then the picture is copied into the text file. The user also
has the option to group the documents in directories or folders.
Accessing multimedia documents
A number of packages for text retrieval software and hypertext are currently
available. With this software, the user can improve the access to documents
and the relationships between documents. Text retrieval software supports
content-based searching for text (so-called full text retrieval). It
supports searching texts of different formats and searching on different
computers in the network.
Hypertext software can be used to place references and relationships
within and between texts. Hypertext supports interactive document navigation.
The user can recognise the so-called hyperlinks in documents for instance
by a different colour of the words. If the user wants more information
about a certain word, he has to point at it with the mouse and click
on it. Text relating to that particular word is then automatically shown.
From there, the user has the possibility to return to the original text.
The documents consist of text and illustrations. Illustrations may be
incorporated in the text , or the text may contain references to separately
stored illustrations.
Multimedia databases
The next phase is the storage of multimedia documents in relational
databases. An example of such a relational database is Oracle Media
Server, a relational database in which text, still and moving images
and sound can also be stored. There are two advantages to storing documents
in the database. The first advantage is that relationships can be made
between structured data and documents. The second advantage is that
the documents can also use the various security facilities of the DBMS.
Object bases
The database described thus far, only store data. The corresponding
applications are separately
stored as programmes in files or programme libraries. This has to do
with the procedure in which the user first starts an application in
the form of programme, and then looks up the data or documents he wants
to edit. With the object-oriented approach, it works the other way around.
The user first looks up the documents containing the objects he wants
to edit, which is then displayed in front of him. To support this way
of working, special object-oriented databases are required, in which
applications are stored as documents and objects with references to
the functions (methods) with which they can be edited.
Suppliers of relational DBMSs, such as Oracle, IBM and Sybase, have
begun to extend their DBMSs to support multimedia documents. There will
also be object-oriented extensions to make it possible to store data
and documents as objects with references to the functions that can be
used on these objects. The software is included in the database as functions
(methods) of objects, instead of as separate programmes. Eventually,
the RDBMS will gradually evolve into an Object base Management System
for all types of objects.
Parallel Processing
A number of suppliers have turned to parallel processing to speed up
the searching and editing of data in databases. Oracle already has a
parallel version for parallel nCube computers, for example. IBM has
announced a parallel version of DB2. Parallel processing is ideally
suited for the future heavy database servers on which enormous quantities
of data are stored.
Distributed databases
All DBMS suppliers are working on or already supply a distributed DBMS
in which the data are stored distributed over various computers. The
problem with the current distributed DBMSs is the fact that they are
based too much on the principle of one large database for all users,
where the user does not have to know where data are located or who they
belong to. This very 'transparency' of location and
organisation, however, is an important reason why the current distributed
databases are unsatisfactory in actual practice.
In itself it is true that the user does not need to know where data
or objects are physically located, but it is of great importance that
the ownership and objects is properly established. This also applies
to software such as programmes, packages and object methods.
Now that there is a growth towards a world-wide network in which users
(at home and at work) must be able to use each other's data, documents
and software, the distributed DBMS must support the legal ownership
of these objects. A number of conditions must be met in order to achieve
this.
- Each user (private individuals, companies
and other organisations) must
have his own, personal database with data, objects and software.
- The owner decides which other users have access to the DBMS and
which rights they have.
- The DBMS must offer methods for navigation with which the users
can search for available data, objects and software.
- The DBMS can also handle the version management of objects.
Summary
When all the conditions are met and the final problems with distribution
have been solved, the eventual
Digital Highway may contain one world-wide DBMS in which everybody
can organise his own database with data, objects and software functions.
If parallel processing is used, many users will be able to simultaneously
access huge databases. Through publication in the navigation system,
one can give others access to ones database and give them the right
to copy objects. The DBMS keeps track of the versions of objects. It
also keeps track of who is currently using copies and supports the payment
of copyright to the owners of objects.