Unit 9 Database Design using ORACLE and SQL.PPT

Database Systems Slide 1
Database Design
Asif Sohail
University of the Punjab
Punjab University College of Information Technology (PUCIT)

Database Design
• It is a process of creating a design that will
support the organization’s objectives for
the required database system.
• There are two main approaches to
database design:
a) Bottom-up Approach
b) Top-down Approach

Bottom-up Approach (design by synthesis)
• It begins with the fundamental level of attributes, which
through analysis of the associations between attributes, are
grouped into relations.
• The process of Normalization is used to find the normalized
relations based on functional dependencies between the
attributes.
• The bottom-up approach is appropriate for the design of simple
databases with relatively small number of attributes.
• This approach becomes difficult to use with a larger number of
attributes, where it is difficult to establish all the functional
dependencies between the attributes.
Database Design

Top-down Approach (design by analysis)
• It starts with a number of groupings of attributes into
relations that exist together naturally, for example, an
invoice.
• The relations are then analyzed through
normalization.
• One may start with the development of ER model,
beginning with the identification of entities and
relationships between the entities, which are of
interest to the organization.
• The ER model is then transformed into a set of
relations using the appropriate rules for conversion.
Database Design

Phases of Database Design
• Database design is an iterative process.
• Database design is made up of three
main phases:
1.Conceptual database design
2.Logical database design
3.Physical database design

Conceptual database design
• The process of constructing a model of the data
used in an enterprise, independent of all
physical considerations, such as the target
DBMS software, application programs,
programming language, hardware platform etc.
• The data model is built using the user’s
requirements specification document.

Steps of Conceptual database design
1. Identify entity types
2. Identify relationship types
3. Identify and associate attributes with entity or relationship
types
4. Determine attribute domains
5. Determine candidate, primary, and alternate key attributes
6. Consider use of enhanced modeling concepts (optional step)
7. Check model for redundancy
8. Validate conceptual data model against user transactions
9. Review conceptual data model with user

Logical database design
• The process of constructing a model of the data
used in an enterprise based on a specific data
model.
• The logical data model is based on the target
model for the database, such as the relational
data model.
• The technique of normalization is used to test
the correctness of the logical data model.

Steps of Logical database design
1. Derive relations for logical data model
2. Validate relations using normalization
3. Validate relations against user transactions
4. Check integrity constraints
5. Review logical data model with user
6. Merge logical data models into global model (optional step)
7. Check for future growth

Physical database design
• The process of producing the description of the
database on the secondary storage; it describes
the base relations, file organizations, and
indexes used to achieve efficient access to the
data.
• Physical database design is tailored to a specific
DBMS.
• Logical database design is concerned with
what, physical database design is concerned
with how.

Steps of Physical database design
1. Design base relations
2. Design representation of derived data
3. Design general constraints
4. Design file organization and indexes
5. Coding and Compression Techniques
6. Analyze transactions
7. De-normalization
8. Partitioning
9. Estimate disk space requirements
10. Design user views
11. Design security mechanism

Design base relations
• For each relation defined in the logical data model, we have a
definition consisting of:
– The name of the relation and its attributes
– Key attributes
– Integrity constraints
• For each attribute, we have:
– Its domain, consisting of its data type, length, and any
constraint on the domain
– An optional default value for the attribute
– Whether the attribute can hold nulls
– Whether the attribute is derived and, if so, how it should be
computed?
Physical Database Design

Derived data
• Whether a derived attribute is stored in the database or
calculated every time it is needed is a tradeoff. The designer
should calculate:
– The additional cost to store the derived data and keep it
consistent with operational data.
– The cost to calculate each time it is required.
• The less expensive option is chosen subject to performance
constraints.
• Derived attribute should be stored when:
– A frequent query is made against it.
– The DBMS’s query language cannot easily calculate the
derived attribute.

Design general constraints
• Constraints are designed to enforce business
rules and the correctness of data.
• The constraints defined at physical database
design can greatly reduce the programming
effort at the application level.
• We can introduce constraints such as DEFAULT
value, CHECK constraint, Referential Integrity,
database triggers etc.

File organization and indexes
• To determine the optimal file organizations to store the base relations and
the indexes that are required to achieve acceptable performance; that is
the way in which relations and tuples will be held on the secondary storage.
• In many cases, a RDBMS may give little or no choice for choosing file
organization, although some may be established as indexes are specified.
• An index should be used against an attribute:
– That is used most often for join operations.
– That is used most often to access the tuples in a relation.
– The columns frequently involved in ORDER BY and GROUP BY clause.
– The column contains a wide range of values.
• An index on primary key is called primary index and an index on non key is
secondary index.

Coding and Compression Techniques
• Coding techniques are useful for compression of data values,
by replacing those data values with the smaller sized codes.
• Consider the following relation:
• EMP (EmpNo, Ename, Job, City,…….)
• If we have large number of records, then the wastage of small
amount of space in each record can lead to loss of huge
amount of data storage space.
• To avoid the above problem, we can use the following
relations:
– EMP (EmpNo, Ename, JobCode, CityCode,…….)
– Job (JobCode, JobTitle)
– City (CityCode, CityName)

Analyze transactions
• To understand the functionality of the transactions that will run on
the database and to analyze the important transactions.
• For efficient database design, it is necessary to have knowledge of the
transactions or queries that will run on the database. It includes the
identification of frequent transactions, the transactions that are
critical, peak load time etc.
• We use the transaction information to select appropriate file
organizations and indexes.
• For transaction analysis, we can use a transaction / relation cross-
reference matrix, which shows the operation a transaction performs
on a certain relation.
• The above information will be used to determine the indexes that are
required.

Transaction / relation cross-reference matrix
Transaction
/ Relation
T1 T2 . . . . . . Tm
I R U D I R U D . . . . . . I R U D
R1 X X X X X
R2 X X X X X X
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rn X X X X
I = Insert; R = Read; U = Update; D = Delete

Denormalization
• Denormalization is a technique to move from higher to lower
normal forms of database modeling in order to speed up
database access.
• A fully normalized database schema can fail to provide
adequate system response time due to excessive table join
operations.
• Denormalization results in combining two relations into one
relation. More specifically, in this step, we consider duplicating
certain attributes or combining relations together to reduce the
number of joins required to perform a query.
• As a general rule of thumb, if performance is unsatisfactory and
a relation has a low update rate and very high query rate,
denormalization may be a viable option.

Denormalization
• Denormalization Situation 1:
• Many to many binary relationships mapped to three relations.
– EMP (empID, eName, job, Sal)
– PROJ (pjId, pjName)
– WORK (empID, pjId, dtHired)
• By de-normalizing these relations, we merge the WORK relation
with PROJ relation. In this case it is violating 2NF and anomalies
of 2NF would be there, but there would be only one join
operation involved by joining two tables, which increases the
efficiency.
– EMP (empID, eName, job, Sal)
– PROJ (pjId, pjName, empId, dtHired)

Denormalization
• Denormalization Situation 2:
• We have attributes that are almost always used together in application and
they end up in different relations, then we will always have to perform a join
operation when we retrieve them.
• Example: Consider the following relation:
– EMP (empID, eName, street, postcode, city)
• The relation isn’t in 3NF, as it has a transitive dependency postcode -> city.
• After converting the above relation into 3NF, we have
– EMP1 (empID, eName, street, postcode)
– EMP2 (postcode, city)
• However, this would mean that we would have to do a join whenever we
want a complete address for a person. In this case, we would settle for 2NF
and implement the original EMP relation.

Partitioning
• De-normalization leads to merging different relations,
whereas partitioning splits same relation into two.
• The general aims of data partitioning and placement in
database are to
– Reduce workload (e.g. data access, communication costs,
search space)
– Balance workload
– Efficiency
• There are two types of partitioning:-
– Horizontal Partitioning
– Vertical Partitioning

Partitioning
Horizontal Partitioning
• Table is split on the basis of rows, which means a larger table is
split into smaller tables.
• Now the advantage of this is that time in accessing the records
of a larger table is much more than a smaller table.
• It also helps in the maintenance of tables, security,
authorization and backup.
• These smaller partitions can also be placed on different disks to
reduce disk contention.

Partitioning
Vertical Partitioning
• It is done on the basis of attributes. Same table is split into
different physical records depending on the nature of accesses.
• Primary key is repeated in all vertical partitions of a table to get
the original table.
• You can replicate a limited subset of a table's columns to other
machines.

Estimate disk space requirements
• This step is required so that the appropriate
disk size can be selected.
• In general, the estimate is based on the size of
each tuple and the number of tuples in a
relation.
• The growth of the relation should also be taken
into consideration.

Design user views
• A View is a subset of the database that is
presented to one or more users.
• A view is often referred to as a virtual table.
• Views are created to satisfy the requirements of
multiple users in an efficient way.

Design security mechanism
• To design the security mechanism for the database as
specified by the user during the requirements
specification.
• Authorization Table is used to restrict access to data
and to restrict the actions that the database users can
perform.
• Fernandez, Summers and Wood developed this
conceptual model of database security.
• It expresses the Authorization Rules in the form of a
table or matrix that include Subject, Object &
Privileges

OBJECT
SUBJECT Student Course Result Faculty
Entry Read Read Read Null
Middle
Read/
Write
Read/
Write
Read Read
Admin All
Update/
Delete
All
Read/
Write/
Update

• The Column Headings represent database objects,
which may be tables, Views, Sequence, Indexes etc.
• The Subjects are written on the left side of the table. It
may be an individual or a group of users.
• The Cell entries of the table specify the privileges.
These include INSERT, UPDATE, DELETE, ALTER, SELECT
• Once the Authorization table or Access Control Matrix
is complete, the DBA grants the privileges accordingly.

Thank you for your attention.
Asif Sohail
Assistant Professor
University of the Punjab
Punjab University College of Information Technology (PUCIT)
Allama Iqbal (Old) Campus, Anarkali
Lahore, Pakistan
Tel: +92-(0)42-111-923-923 Ext. 154
E-mail: asif@pucit.edu.pk

Unit 9 Database Design using ORACLE and SQL.PPT

More Related Content

Similar to Unit 9 Database Design using ORACLE and SQL.PPT (20)

Recently uploaded (20)

Unit 9 Database Design using ORACLE and SQL.PPT

Editor's Notes