SlideShare a Scribd company logo
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 1 
Paper: BCA-302 
DATABASE 
MANAGEMENT 
SYSTEM 
DEPARTMENT OF COMPUTER SCIENCE 
DEV SANSKRITI VISHWAVIDYALAYA, SHANTIKUNJ,HARIDWAR (UK) 
July-Dec 2014. Notes-ization @ DSVV.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 2 
PREAMBLE 
ACKNOWLEDGEMENTS 
Department of Computer Science at Dev Sanskriti Vishwavidyalaya, Shantikunj, 
Haridwar (Uttarakhand) was established in year 2006. Department started Bachelor 
of Computer Applications (BCA) in year 2012. The serene and vibrant 
environment of the university is a boon for the students. Academically they learn 
new things everyday but along with that the curriculum of life management 
induces virtues of humanities in them. 
It was an initiative taken by students of BCA (2013-2016) batch to work in a team 
and instead of doing revision only to do a prevision on the subject. They gave it a 
name ―Notes-ization‖. Every one contributed to it as per his/her own caliber. But 
finally it‘s an sincere effort by Manan Singh (Student BCA III Sem) to finally 
make the work presentable and reliable to make the effort of his team mates 
fruitful and worth significant. Special thanks to all the web sources. Thank you 
every one for this inspirational work. Hope it will benefit one an all. Thanks again 
for carrying the spirit of SHARE-CARE-PROSPER
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 3 
TABLE OF CONTENTS 
UNIT TOPICS 
UNIT 1 Introduction to Database: Definition of Database, Components 
of DBMS, Three Level of Architecture proposal for DBMS, 
Advantage & Disadvantage of DBMS, Data independence, 
Purpose of Database Management Systems, Structure of DBMS, 
DBA and its responsibilities, Data Dictionary, Advantages of 
Data Dictionary. 
UNIT 2 Data Models: Introduction to Data Models, Object Based 
Logical Model, Record Base Logical Model- Relational Model, 
Network Model, Hierarchical Model. Entity Relationship 
Model, Entity Set, Attribute, Relationship Set. Entity 
Relationship Diagram (ERD), Extended features of ERD. 
UNIT 3.1 Relational Databases: Introduction to Relational Databases and 
Terminology- Relation, Tuple, Attribute, Cardinality, Degree, 
Domain. Keys- Super Key, Candidate Key, Primary Key, 
Foreign Key. 
UNIT 3.2 Relational Algebra: Operations, Select, Project, Union, 
Difference, Intersection Cartesian product, Join, Natural Join. 
UNIT 4 Structured Query Language (SQL): Introduction to SQL, 
History of SQL, Concept of SQL, DDL Commands, DML 
Commands, DCL Commands, Simple Queries, Nested Queries, 
Normalization: Benefits of Normalization, Normal Forms- 
1NF, 2NF, 3NF, BCNF & and Functional Dependency. 
UNIT 5 Relational Database Design: Introduction to Relational 
Database Design, DBMS v/s RDBMS. Integrity rule, Concept of 
Concurrency Control and Database Security.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 4 
UNIT 1 
INTRODUCTION TO DATABASE 
Introduction to Database: Definition of Database, Components of DBMS, Three 
Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS, 
Data independence, Purpose of Database Management Systems, Structure of 
DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data 
Dictionary.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 5 
DEFINITION OF DATABASE 
A database can be summarily described as a repository for data. A database is structured 
collection of data. Thus, card indices, printed catalogues of archaeological artifacts and 
telephone directories are all examples of databases. It may be stored on a computer and 
examined using a program. These programs are often called `databases', but more strictly are 
database management systems (DMS). 
Computer-based databases are usually organized into one or more tables. A table stores data in a 
format similar to a published table and consists of a series of rows and columns. To carry the 
analogy further, just as a published table will have a title at the top of each column, so each 
column in a database table will have a name, often called a field name. The term field is often 
used instead of column. Each row in a table will represent one example of the type of object 
about which data has been collected.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 6 
COMPONENTS OF DBMS 
A database management system (DBMS) consists of several components. Each component plays 
very important role in the database management system environment. The major components of 
database management system are: 
 Software 
 Hardware 
 Data 
 Procedures 
 Database Access Language 
Software 
The main component of a DBMS is the software. It is the set of programs used to handle the 
database and to control and manage the overall computerized database 
1. DBMS software itself is the most important software component in the overall system. 
2. Operating system including network software being used in network, to share the data of 
database among multiple users. 
3. Application programs developed in programming languages such as C++, Visual Basic 
that are used to access database in database management system. Each program contains 
statements that request the DBMS to perform operation on database. The operations may 
include retrieving, updating, deleting data etc. The application program may be 
conventional or online workstations or terminals 
Hardware 
Hardware consists of a set of physical electronic devices such as computers (together with 
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices 
that make interface between computers and the real world systems etc. and so on. It is impossible 
to implement the DBMS without the hardware devices. In a network, a powerful computer with 
high data processing speed and a storage device with large storage capacity are required as 
database server. 
Characteristics: 
It is helpful to categorize computer memory into two classes: internal memory and external 
memory. Although some internal memory is permanent, such as ROM, we are interested here 
only in memory that can be changed by programs. This memory is often known as RAM. This 
memory is volatile, and any electrical interruption causes the loss of data. 
By contrast, magnetic disks and tapes are common forms of external memory. They are
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 7 
Non-volatile memory and they retain their content for practically unlimited amounts of time. The 
physical characteristics of magnetic tapes force them to be accessed sequentially, making them 
useful for backup purposes, but not for quick access to specific data. 
In examining the memory needs of a DBMS, we need to consider the following issues: 
•Data of a DBMS must have a persistent character; in other words, data must remain available 
long after any program that is using it has completed its work. Also, data must remain intact even 
if the system breaks down. 
•A DBMS must access data at a relatively high rate. 
•Such a large quantity of data needs to be stored that the storage medium must be low cost.These 
requirements are satisfied at the present stage of technological development only by magnetic 
disks. 
Data 
Data is the most important component of the DBMS. The main purpose of DBMS is to process 
the data. In DBMS, databases are defined, constructed and then data is stored, updated and 
retrieved to and from the databases. The database contains both the actual (or operational) data 
and the metadata (data about data or description about data). 
Procedures 
Procedures refer to the instructions and rules that help to design the database and to use the 
DBMS. The users that operate and manage the DBMS require documented procedures on hot use 
or run the database management system. These may include. 
1. Procedure to install the new DBMS. 
2. To log on to the DBMS. 
3. To use the DBMS or application program. 
4. To make backup copies of database. 
5. To change the structure of database. 
6. To generate the reports of data retrieved from database. 
Database Access Language 
The database access language is used to access the data to and from the database. The users use 
the database access language to enter new data, change the existing data in database and to 
retrieve required data from databases. The user writes a set of appropriate commands in a 
database access language and submits these to the DBMS. The DBMS translates the user 
commands and sends it to a specific part of the DBMS called the Database Jet Engine. The 
database engine generates a set of results according to the commands submitted by user, converts
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 8 
these into a user readable form called an Inquiry Report and then displays them on the screen. 
The administrators may also use the database access language to create and maintain the 
databases. 
The most popular database access language is SQL (Structured Query Language). Relational 
databases are required to have a database query language. 
Users 
The users are the people who manage the databases and perform different operations on the 
databases in the database system. There are three kinds of people who play different roles in 
database system 
1. Application Programmers 
2. Database Administrators 
3. End-Users 
Application Programmers 
The people who write application programs in programming languages (such as Visual Basic, 
Java, or C++) to interact with databases are called Application Programmer. 
Database Administrators 
A person who is responsible for managing the overall database management system is called 
database administrator or simply DBA. 
End-Users 
The end-users are the people who interact with database management system to perform 
different operations on database such as retrieving, updating, inserting, deleting data etc.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 9 
3 LEVEL OF ARCHITECTURE PROPOSAL OF DBMS 
The logical architecture, also known as the ANSI/SPARC architecture, was elaborated at the 
beginning of the 1970s. It distinguishes three layers of data abstraction: 
1. The physical layer contains specific and detailed information that describe show data are 
stored: addresses of various data components, lengths in bytes, etc. DBMSs aim to 
achieve data independence, which means that the database organization at the physical 
level should be indifferent to application programs. 
2. The logical layer describes data in a manner that is similar to, say, definitions of 
structures in C. This layer has a conceptual character; it shields the user from the tedium 
of details contained by the physical layer, but is essential in formulating queries for the 
DMBS. 
3. The user layer contains each user‘s perspective of the content of the database. 
The logical architecture describes how data in the database is perceived by users. It is not 
concerned with how the data is handled and processed by the DBMS, but only with how it looks. 
The method of data storage on the underlying file system is not revealed, and the users can 
manipulate the data without worrying about where it is located or how it is actually stored. This 
results in the database having different levels of abstraction. 
The majority of commercial Database Management System available today is based on the 
ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group 
on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It 
divides the system into three levels of abstraction: the internal or physical level, the conceptual 
level, and the external or view level. 
The External or View Level: 
The external or view level is the highest level of abstraction of database. It provides a window on 
the conceptual view, which allows the user to see only the data of interest to them. The user can 
be either an application program or an end user. There can be many external views as any 
number of external schemas can be defined and they can overlap each other. It consists of the 
definition of logical records and relationships in the external view. It also contains the method of 
deriving the objects such as entities, attributes and relationships in the external view from the 
conceptual view. 
The Conceptual Level or Global Level: 
The conceptual level presents a logical view of the entire database as a unified whole. It allows 
the user to bring all the data in the database together and see it in a consistent manner. Hence, 
there is only one conceptual schema per database. The first stage in the design of a database is to 
define the conceptual view, and a DBMS provides a data definition language for this purpose. it 
describes all the records and relationships included in the database. 
The data definition language used to create the conceptual level must not specify any physical 
storage considerations that should be handled by the physical level. It does not provide any 
storage or access details, but defines the information content only.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 10 
The Internal or Physical Level: 
The collection of files permanently stored on secondary storage devices is known as the physical 
database. The physical or internal level is the one closest to the physical storage and it provide a 
low level description of the physical database, and an interface between the operating system file 
system and the record structures used in higher level of abstraction. It is at this level that record 
types and methods of storage are defined, as well as how stored fields are represented, what 
physical sequence the stored records are in, and what other physical structures exist.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 11 
ADVANTAGES & DISADVANTAGES OF DBMS 
Advantages of the DBMS: 
The DBMS serves as the intermediary between the user and the database. The database structure 
itself is stored as a collection of files, and the only way to access the data in those files is through 
the DBMS. The DBMS receives all application requests and translates them into the complex 
operations required to fulfill those requests. The DBMS hides much of the database‘s internal 
complexity from the application programs and users. 
The different advantages of DBMS are as follows: 
1. Improved data sharing. 
The DBMS helps create an environment in which end users have better access to more and 
better-managed data. Such access makes it possible for end users to respond quickly to changes 
in their environment. 
2. Improved data security. 
The more users access the data, the greater the risks of data security breaches. Corporations 
invest considerable amounts of time, effort, and money to ensure that corporate data are used 
properly. A DBMS provides a framework for better enforcement of data privacy and security 
policies. 
3. Better data integration. 
Wider access to well-managed data promotes an integrated view of the organization‘s operations 
and a clearer view of the big picture. It becomes much easier to see how actions in one segment 
of the company affect other segments. 
4. Minimized data inconsistency. 
Data inconsistency exists when different versions of the same data appear in different places. 
For example, data inconsistency exists when a company‘s sales department stores a sales 
representative‘s name as ―Bill Brown‖ and the company‘s personnel department stores that same 
person‘s name as ―William G. Brown,‖ or when the company‘s regional sales office shows the 
price of a product as $45.95 and its national sales office shows the same product‘s price as 
$43.95. The probability of data inconsistency is greatly reduced in a properly designed database. 
5. Improved data access. 
The DBMS makes it possible to produce quick answers to ad hoc queries. From a database 
perspective, a query is a specific request issued to the DBMS for data manipulation—for 
example, to read or update the data. Simply put, a query is a question, and an ad hoc query is a 
spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to 
the application. For example, end users, when dealing with large amounts of sales data, might 
want quick answers to questions (ad hoc queries) such as: 
- What was the dollar volume of sales by product during the past six months? 
- What is the sales bonus figure for each of our salespeople during the past three months? 
- How many of our customers have credit balances of $3,000 or more?
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 12 
6.Improved decision making. 
Better-managed data and improved data access make it possible to generate better-quality 
information, on which better decisions are based. The quality of the information generated 
depends on the quality of the underlying data. Data quality is a comprehensive approach to 
promoting the accuracy, validity, and timeliness of the data. While the DBMS does not guarantee 
data quality, it provides a framework to facilitate data quality initiatives. 
7.Increased end-user productivity. 
The availability of data, combined with the tools that transform data into usable information, 
empowers end users to make quick, informed decisions that can make the difference between 
success and failure in the global economy. 
Disadvantages of Database: 
Although the database system yields considerable advantages over previous data management 
approaches, database systems do carry significant disadvantages. For example: 
1. Increased costs. 
Database systems require sophisticated hardware and software and highly skilled personnel. The 
cost of maintaining the hardware, software, and personnel required to operate and manage a 
database system can be substantial. Training, licensing, and regulation compliance costs are 
often overlooked when database systems are implemented. 
2. Management complexity. 
Database systems interface with many different technologies and have a significant impact on a 
company‘s resources and culture. The changes introduced by the adoption of a database system 
must be properly managed to ensure that they help advance the company‘s objectives. Given the 
fact that database systems hold crucial company data that are accessed from multiple sources, 
security issues must be assessed constantly. 
3. Maintaining currency. 
To maximize the efficiency of the database system, you must keep your system current. 
Therefore, you must perform frequent updates and apply the latest patches and security measures 
to all components. Because database technology advances rapidly, personnel training costs tend 
to be significant. Vendor dependence. Given the heavy investment in technology and personnel 
training, companies might be reluctant to change database vendors. As a consequence, vendors 
are less likely to offer pricing point advantages to existing customers, and those customers might 
be limited in their choice of database system components. 
4. Frequent upgrade/replacement cycles. 
DBMS vendors frequently upgrade their products by adding new functionality. Such new 
features often come bundled in new upgrade versions of the software. Some of these versions 
require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs 
money to train database users and administrators to properly use and manage the new features.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 13 
DATA INDEPENDENCE 
A major objective for three-level architecture is to provide data independence, which means that 
upper levels are unaffected by changes in lower levels. 
There are two kinds of data independence: 
• Logical data independence 
• Physical data independence 
Logical Data Independence 
Logical data independence indicates that the conceptual schema can be changed without 
affecting the existing external schemas. The change would be absorbed by the mapping between 
the external and conceptual levels. Logical data independence also insulates application 
programs from operations such as combining two records into one or splitting an existing record 
into two or more records. This would require a change in the external/conceptual mapping so as 
to leave the external view unchanged. 
Physical Data Independence 
Physical data independence indicates that the physical storage structures or devices could be 
changed without affecting conceptual schema. The change would be absorbed by the mapping 
between the conceptual and internal levels. Physical data independence is achieved by the 
presence of the internal level of the database and the mapping or transformation from the 
conceptual level of the database to the internal level. Conceptual level to internal level mapping, 
therefore provides a means to go from the conceptual view (conceptual records) to the internal 
view and hence to the stored data in the database (physical records). 
If there is a need to change the file organization or the type of physical device used as a result of 
growth in the database or new technology, a change is required in the conceptual/ internal 
mapping between the conceptual and internal levels. This change is necessary to maintain the 
conceptual level invariant. The physical data independence criterion requires that the conceptual 
level does not specify storage structures or the access methods (indexing, hashing etc.) used to 
retrieve the data from the physical storage medium. Making the conceptual schema physically 
data independent means that the external schema, which is defined on the conceptual schema, is 
in turn physically data independent. 
The Logical data independence is difficult to achieve than physical data independence as it 
requires the flexibility in the design of database and prograll1iller has to foresee the future 
requirements or modifications in the design.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 14 
PURPOSE OF DBMS 
Database management systems were developed to handle the following difficulties of typical 
file-processing systems supported by conventional operating systems. Data redundancy and 
inconsistency. Difficulty in accessing data isolation – multiple files and formats. Integrity 
problems, Atomicity of updates, Concurrent access by multiple users and Security problems. 
 In the early days, database applications were built directly on top of the 
file system. 
 Drawbacks of using file systems to store data: 
- Data redundancy and inconsistency. 
- Multiple file formats, duplication of information in different file. 
- Difficulty in accessing data. 
- Need to write a new program to carry out each new task. 
- Data isolation — multiple files and formats. 
- Integrity constraints 
- Hard to add new constraints or change existing ones. 
These problems and others led to the development of database management systems.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 15 
STRUCTURE OF DBMS 
The components in the structure of DBMS are described below: 
DBA :- DBA means Database Administrator. HeShe is person which is responsible for the 
installation, configuration, upgrading, administration, monitoring, maintenance, and security of 
databases in an organization. 
Database Schema: - A database schema defines its entities and the relationship among them. 
Database schema is a descriptive detail of the database, which can be depicted by means of
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 16 
schema diagrams. All these activities are done by database designer to help programmers in 
order to give some ease of understanding all aspect of database. 
DDL Processor: - The DDL Processor or Compiler converts the data definition statements into a 
set of tables. These tables contain the metadata concerning the database and are in a form that 
can be used by other components of DBMS. 
Data Dictionary: - Information pertaining to the structure and usage of data contained in the 
database, the metadata, is maintained in a data dictionary. The term system catalog also describes 
this meta data. The data dictionary, which is a database itself, documents the data. Each database 
user can consult the data dictionary to learn what each piece of data and various synonyms of the 
data fields mean. 
Integrity Checker: - It checks the integrity constraints so that only valid data can be entered into 
the database. 
User: - The users are either application programmers or on-line terminal users of any degree of 
sophistication. Each user has a language at his or her disposal. For the application programmer it 
will be a conventional programming language, such as COBOL or PL/I; for the terminal user it 
will be either a query language or a special purpose language tailored to that user‘s requirements 
and supported by an on-line application program. 
Queries:- In DBMS a search questions that instruct the program to locate records that need 
specific criteria is called Query. 
Query Processor: - The query processor transforms user queries into a series of low level 
instructions. It is used to interpret the online user's query and convert it into an efficient series of 
operations in a form capable of being sent to the run time data manager for execution. The query 
processor uses the data dictionary to find the structure of the relevant portion of the database and 
uses this information in modifying the query and preparing and optimal plan to access the 
database. 
Programmer:- Programmer can manipulate the database in all possible ways. 
Application Program:- Complete, self-contained computer program that performs a specific 
useful task, other than system maintenance functions application programs.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 17 
DML Processor:- DML processor process the data manipulation statements such as select , 
update , delete etc. that are passed by the application programmer into a computer program that 
perform specified task by programmer such as delete a table etc. 
Authorization Control: - The authorization control module checks the authorization of users in 
terms of various privileges to users. 
Command Process: - The command processor processes the queries passed by authorization 
control module. 
Query Optimizer: - The query optimizers determine an optimal strategy for the query 
execution. 
Transaction Manager: - The transaction manager ensures that the transaction properties should 
be maintained by the system. 
Scheduler: - It provides an environment in which multiple users can work on same piece of data 
at the same time in other words it supports concurrency. 
Buffer Manager: - The buffer manager is the software layer responsible for bringing pages from 
disk to main memory as needed. The buffer manager manages the available main memory by 
partitioning it into a collection of pages, which we collectively refer to as the buffer pool. 
Recovery Manager: - The recovery manager , which is responsible for maintaining a log and 
restoring the system to a consistent state after a crash. It is responsible for ensuring transaction 
atomicity and durability. 
Physical Database: - The physical database specifies additional storage details. We must decide 
what file organization to use to store the relations and create auxiliary data structure called 
indexes.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 18 
DBA & ITS RESPONSIBILITIES 
A Database Administrator (acronym: DBA) is an IT Professionals responsible for: Installation, 
Configuration, Upgrade, Administration, Monitoring, Maintenance and Securing, of databases in 
an organization. 
Database administrator responsibilities are as follows:- 
1. Database Installation and upgrading 
2. Database configuration including configuration of background Processes 
3. Database performance optimization & fine tuning 
4. Configuring the Database in Archive log mode 
5. Maintaining Database in archive log mode 
6. Devising Database backup strategy 
7. Monitoring & checking the Database backup & recovery process 
8. Database troubleshooting 
9. Database recovery in case of crash 
10. Database security 
11. Enabling auditing features wherever required 
12. Table space management 
13. Database Analysis report 
14. Database health monitoring 
15. Centralized controlled 
List of skills required to become database administrators are:- 
 Communication skills 
 Knowledge of database theory 
 Knowledge of database design 
 Knowledge about the RDBMS itself, e.g. Oracle Database, IBM DB2, Microsoft SQL 
Server, Adaptive Server Enterprise, MaxDB, PostgreSQL 
 Knowledge of Structured Query Language (SQL) e.g. SQL/PSM, Transact-SQL 
 General understanding of distributed computing architectures, e.g. Client/Server, 
Internet/Intranet, Enterprise 
 General understanding of the underlying operating system, e.g. Windows, Unix, Linux. 
 General understanding of storage technologies, memory management, disk arrays, 
NAS/SAN, networking 
 General understanding of routine maintenance, recovery, and handling failover of a 
Database
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 19 
DATA DICTIONARY & ITS ADVANTAGES 
A data dictionary, or metadata repository, as defined in the Dictionary of Computing, is a 
"centralized repository of information about data such as meaning, relationships to other data, 
origin, usage, and format." The term may have one of several closely related meanings pertaining 
to databases and database management systems (DBMS): 
 a document describing a database or collection of databases. 
 an integral component of a DBMS that is required to determine its structure. 
 a piece of middleware that extends or supplants the native data dictionary of a DBMS. 
The term data dictionary and data repository are used to indicate a more general software 
utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the 
information stored in it to the user and the DBA, but it is mainly accessed by the various 
software modules of the DBMS itself, such as DDL and DML compilers, the query optimizer, 
the transaction processor, report generators, and the constraint enforcer. On the other hand, a 
data dictionary is a data structure that stores metadata, i.e., (structured) data about data. 
Any well designed database will surely include a data dictionary as it gives database 
administrators and other users easy access to the type of data that they should expect to see in 
every table, row, and column of the database, without actually accessing the database. 
Since a database is meant to be built and used by multiple users, making sure that everyone is 
aware of the types of data each field will accept becomes a challenge, especially when there is a 
lack of consistency when assigning data types to fields. A data dictionary is a simple yet 
effective add-on to ensure data consistency. 
Some of the typical components of a data 
dictionary entry are: 
• Name of the table 
• Name of the fields in each table 
• Data type of the field (integer, date, 
text…) 
• Brief description of the expected data 
for each field 
• Length of the field 
• Default value for that field 
• Is the field Nullable or Not Nullable? 
• Constraints that apply to each field, if 
any 
Not all of these fields (and many others) will apply to every single entry in the data dictionary. 
For example, if the entry were about the root description of the table, it might not require any
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 20 
information regarding fields. Some data dictionaries also include location details, such as each 
field‘s current location, where it actually came from, and details of the physical location such as 
the IP address or DNS of the server. 
Format and Storage 
There exists no standard format for creating a data dictionary. Meta-data differs from table to 
table. Some database administrators prefer to create simple text files, while others use diagrams 
and flow charts to display all their information. The only prerequisite for a data dictionary is that 
it should be easily searchable. 
Again, the only applicable rule for data dictionary storage is that it should be at a convenient 
location that is easily accessible to all database users. The types of files used to store data 
dictionaries range from text files, xml files, spreadsheets, an additional table in the database 
itself, to handwritten notes. It is the database administrator‘s duty to make sure that this 
document is always up to date, accurate, and easily accessible. 
Creating the Data Dictionary 
First, all the information required to create the data dictionary must be identified and recorded in 
the design documents. If the design documents are in a compatible format, it should be possible 
to directly export the data in them to the desired format for the data dictionary. For example, 
applications like Microsoft Visio allow database creation directly from the design structure and 
would make creation of the data dictionary simpler. Even without the use of such tools, scripts 
can be deployed to export data from the database to the document. There is always the option of 
manually creating these documents as well. 
Advantages of a Data Dictionary 
The primary advantage of creating an informative and well designed data dictionary is that it 
exudes clarity on the rest of the database documentation. Also, when a new user is introduced to 
the system or a new administrator takes over the system, identifying table structures and types 
becomes simpler. In scenarios involving large databases where it is impossible for an 
administrator to completely remember specific bits of information about thousands of fields, a 
data dictionary becomes a crucial necessity.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 21 
UNIT 2 
DATA MODELS 
Data Models: Introduction to Data Models, Object Based Logical Model, Record 
Base Logical Model- Relational Model, Network Model, Hierarchical Model. 
Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity 
Relationship Diagram (ERD), Extended features of ERD.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 22 
INTRODUCTION TO DATA MODELS 
Data Model can be defined as an integrated collection of concepts for describing and 
manipulating data, relationships between data, and constraints on the data in an organization. 
The importance of data models is that data models can facilitate interaction among the designer, 
the application programmer and the end user. Also, a well- developed data model can even foster 
improved understanding of the organization for which the database design is developed. Data 
models are a communication tool as well. 
A data model comprises of three components: 
• A structural part, consisting of a set of rules according to which databases can be constructed. 
• A manipulative part, defining the types of operation that are allowed on the data (this includes 
the operations that are used for updating or retrieving data from the database and for changing 
the structure of the database). 
• Possibly a set of integrity rules, which ensures that the data is accurate. 
The purpose of a data model is to represent data and to make the data understandable. There 
have been many data models proposed in the literature. They fall into three broad categories: 
• Object Based Data Models 
• Physical Data Models 
• Record Based Data Models
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 23 
OBJECT BASED LOGICAL MODEL 
, 
Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct 
object (a person, place, concept, and event) in the organization that is to be represented in the database. 
An attribute is a property that describes some aspect of the object that we wish to record, and a 
relationship is an association between entities. 
Some of the more common types of object based data model are: 
• Entity-Relationship 
• Object Oriented 
• Semantic 
• Functional
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 24 
RECORD BASED LOGICAL MODEL & ITS TYPES 
Record based logical models are used in describing data at the logical and view levels. In 
contrast to object based data models, they are used to specify the overall logical structure of the 
database and to provide a higher-level description of the implementation. Record based models 
are so named because the database is structured in fixed format records of several types. Each 
record type defines a fixed number of fields, or attributes, and each field is usually of a fixed 
length. 
The three most widely accepted record based data models are: 
• Hierarchical Model 
• Network Model 
• Relational Model
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 25 
RELATIONAL MODEL 
The relational model for database is a database model based on first-order predicate logic, first 
formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all 
data is represented in terms of tuples, grouped into relations. A database organized in terms of 
the relational model is a relational database. 
Advantages of Relational Model: 
Conceptual Simplicity: We have seen that both the hierarchical and network models are 
conceptually simple, but relational model is simpler than both of those two. 
Structural Independence: In the Relational model, changes in the structure do not affect the 
data access. 
Design Implementation: the relational model achieves both data independence and structural 
independence. 
Ad hoc query capability: the presence of very powerful, flexible and easy to use capability is 
one of the main reason for the immense popularity of the relational database model. 
Disadvantages of Relational Model: 
Hardware overheads: relational database systems hide the implementation complexities and the 
physical data storage details from the user. For doing this, the relational database system need 
more powerful hardware computers and data storage devices. 
Ease of design can lead to bad design: the relational database is easy to design and use. The 
user needs not to know the complexities of the data storage. This ease of design and use can lead 
to the development and implementation of the very poorly designed database management 
system.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 26 
NETWORK MODEL 
The network model is a database model conceived as a flexible way of representing objects and 
their relationships. Its distinguishing feature is that the schema, viewed as a graph in which 
object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or 
lattice. 
While the hierarchical database model structures data as a tree of records, with each record 
having one parent record and many children, the network model allows each record to have 
multiple parent and child records, forming a generalized graph structure. 
Advantages Network Model : 
Conceptual Simplicity: just like hierarchical model it also simple and easy to implement. 
Capability to handle more relationship types: the network model can handle one to one1:1 and 
many to many N: N relationship. 
Ease to access data: the data access is easier than the hierarchical model. 
Data Integrity: Since it is based on the parent child relationship, there is always a link between 
the parent segment and the child segment under it. 
Data Independence: The network model is better than hierarchical model in case of data 
independence. 
Disadvantages of Network Model: 
System Complexity: All the records have to maintain using pointers thus the database structure 
becomes more complex. 
Operational Anomalies: As discussed earlier in network model large number of pointers is 
required so insertion, deletion and updating more complex. 
Absence of structural Independence: there is lack of structural independence because when we 
change the structure then it becomes compulsory to change the application too.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 27 
HIERARCHICAL MODEL 
A hierarchical database model is a data model in which the data is organized into a tree-like 
structure. The data is stored as records which are connected to one another through links. A 
record is a collection of fields, with each field containing only one value. The entity type of a 
record defines which fields the record contains. 
Advantages of Hierarchical model 
1.Simplicity: Since the database is based on the hierarchical structure, the relationship between 
the various layers is logically simple. 
2.Data Security :Hierarchical model was the first database model that offered the data security 
that is provided by the dbms. 
3.Data Integrity: Since it is based on the parent child relationship, there is always a link 
between the parent segment and the child segment under it. 
4.Efficiency: It is very efficient because when the database contains a large number of 1:N 
relationship and when the user require large number of transaction. 
Disadvantages of Hierarchical model: 
1. Implementation complexity: Although it is simple and easy to design, it is quite complex to 
implement. 
2.Database Management Problem: If you make any changes in the database structure, then you 
need to make changes in the entire application program that access the database. 
3.Lack of Structural Independence: there is lack of structural independence because when we 
change the structure then it becomes compulsory to change the application too. 
4.Operational Anomalies: Hierarchical model suffers from the insert, delete and update 
anomalies, also retrieval operation is difficult.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 28 
ENTITY RELATIONSHIP MODEL 
In DBMS, an entity–relationship model (ER model) is a data model for describing the data or 
information aspects of a business domain or its process requirements, in an abstract way that 
lends itself to ultimately being implemented in a database such as a relational database. The main 
components of ER models are entities (things) and the relationships that can exist among them, 
and databases. 
Entity–relationship modeling was developed by Peter Chen and published in a 1976 paper. 
However, variants of the idea existed previously, and have been devised subsequently such as 
supertype and subtype data entities and commonality relationships. 
ER model is represents real world situations using concepts, which are commonly used by 
people. It allows defining a representation of the real world at logical level.ER model has no 
facilities to describe machine-related aspects. 
In ER model the logical structure of data is captured by indicating the grouping of data into 
entities. The ER model also supports a top-down approach by which details can be given in 
successive stages. 
Entity: - An entity is something which is described in the database by storing its data, it 
may be a concrete entity a conceptual entity. 
Entity set:- An entity set is a collection of similar entities. 
Attribute:- An attribute describes a property associated with entities. Attribute will have a 
name and a value for each entity. 
Domain:- A domain defines a set of permitted values for a attribute.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 29 
ENTITY SET 
Entity set:- An entity set is a collection of similar entities. 
A database can be modeled as: 
*"a collection of entities, 
*"relationship among entities. 
An entity is an object that exists and is distinguishable from other objects. 
Ex:- specific person, company, event, plant 
Entities have attributes 
Ex:- people have names and addresses. 
An entity set is a set of entities of the same type that share the same properties. 
Ex:- set of all persons, companies, trees, holidays. 
Entity is a thing in the real world with an independent existence. and entity set is collection or set 
all entities of a particular entity type at any point of time. Take an example: a company have 
many employees ,and these employees are defined as entities(e1,e2,e3....) and all these entities 
having same attributes are defined under ENTITY TYPE employee, and set{e1,e2,.....} is called 
entity set. we can also understand this by an anology. entity type is like fruit which is a class .we 
haven't seen any "fruit" yet though we have seen instance of fruit like "apple ,banana,mango etc. 
hence..fruit=entity type=EMPLOYEE apple=entity=e1 or e2 or e3enity set= bucket of apple, 
banana ,mango etc={e1,e2......}
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 30 
ATTRIBUTE 
In a database management system (DBMS), an attribute may describe a component of the 
database, such as a table or a field, or may be used itself as another term for a field. 
A table contains one or more columns there columns are the attribute in DBMS For Example-- 
say you have a table named "employee information" which have the following columns 
ID,NAME,ADDRESS THEN id ,name address are the attributes of employee.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 31 
RELATIONSHIP SET 
The association among entities is called relationship. For example, employee entity has relation 
works at with department. Another example is for student who enrolls in some course. Here, 
Works at and Enrolls are called relationship. 
Relationship Set 
Relationship of similar type is called relationship set. Like entities, a relationship too can have 
attributes. These attributes are called descriptive attributes. 
Degree of Relationship 
The number of participating entities in an relationship defines the degree of the relationship. 
Binary = degree 2 
Ternary = degree 3 
n-ary = degree 
Mapping Cardinalities 
Cardinality defines the number of entities in one entity set which can be associated to 
the number of entities of other set via relationship set. 
One-to-one: one entity from entity set A can be associated with at most one entity of 
entity set B and vice versa. 
One-to-many: One entity from entity set A can be associated with more than one entities of 
entity set B but from entity set B one entity can be associated with at most one entity.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 32 
Many-to-one: More than one entities from entity set A can be associated with at most one entity 
of entity set B but one entity from entity set B can be associated with more than one entity from 
entity set A. 
Many-to-many: one entity from A can be associated with more than one entity from B and vice 
versa
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 33 
ENTITY RELATIONSHIP DIAGRAM (ERD) 
Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the 
relationships between entities in a database. ER diagrams often use symbols to represent three 
different types of information. Boxes are commonly used to represent entities. Diamonds are 
normally used to represent relationships and ovals are used to represent attributes.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 34 
Components of ER Diagram 
The ER diagram has three main components: 
1) Entity 
An Entity can be an object, place, person or class. In ER Diagram, an entity is represented using 
rectangles. Consider an example of an Organization. Employee, manager, Department, Product 
and many more can be taken as entities from an Organization. 
Weak Entity 
A weak entity is an entity that must defined by a foreign key relationship with another entity as it 
cannot be uniquely identified by its own attributes alone.Weak entity is an entity that depends on 
another entity. Weak entity doen‘t have key attribute of their own. Double rectangle represents 
weak entity. 
2) Attribute 
An Attribute describes a property or characterstic of an entity. For example, Name, Age, 
Address etc can be attributes of a Student. Databases contain information about each entity. This 
information is tracked in individual fields known as attributes, which normally correspond to the 
columns of a database table.An attribute is represented using eclipse. 
Key Attribute 
A key attribute is the unique, distinguishing characteristic of the entity. For example, an 
employee‘s social security number might be the employee‘s key attribute.Key attribute
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 35 
represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with 
underlying lines represent Key Attribute. 
Composite Attribute 
An attribute can also have their own attributes. These attributes are known as Composite 
attribute. 
3) Relationship 
Relationships illustrate how two entities share information in the database structure.A 
Relationship describes relations between entities. Relationship is represented using diamonds. 
There are three types of relationship that exist between Entities. 
 Binary Relationship 
 Recursive Relationship 
 Ternary Relationship 
Binary Relationship 
Binary Relationship means relation between two Entities. This is further divided into three types.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 36 
1. One to One : This type of relationship is rarely seen in real world. 
The above example describes that one student can enroll ony for one course and a course 
will also have only one Student. This is not what you will usually see in relationship. 
2. One to Many : It reflects business rule that one entity is associated with many number of 
same entity. For example, Student enrolls for only one Course but a Course can have 
many Students. 
The arrows in the diagram describes that one student can enroll for only one course. 
3. Many to Many : 
The above diagram represents that many students can enroll for more than one courses. 
Recursive Relationship 
In some cases, entities can be self-linked. For example, employees can supervise other 
employees. 
Ternary Relationship 
Relationship of degree three is called Ternary relationship.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 37 
EXTENDED FEATURES OF ERD 
ER Model has the power of expressing database entities in conceptual hierarchical manner such 
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the 
hierarchy it gives us detail of every entity included. 
Going up in this structure is called generalization, where entities are clubbed together to 
represent a more generalized view. For example, a particular student named, Mira can be 
generalized along with all the students, the entity shall be student, and further a student is person. 
The reverse is called specialization where a person is student, and that student is Mira. 
Generalization 
As mentioned above, the process of generalizing entities, where the generalized entities contain 
the properties of all the generalized entities is called Generalization. In generalization, a number 
of entities are brought together into one generalized entity based on their similar characteristics. 
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds. 
Specialization 
Specialization is a process, which is opposite to generalization, as mentioned above. In 
specialization, a group of entities is divided into sub-groups based on their characteristics. Take a 
group Person for example. A person has name, date of birth, gender etc. These properties are 
common in all persons, human beings. But in a company, a person can be identified as employee, 
employer, customer or vendor based on what role do they play in company. 
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on 
what role do they play in school as entities.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 38 
Inheritance 
We use all above features of ER-Model, in order to create classes of objects in object oriented 
programming. This makes it easier for the programmer to concentrate on what she is 
programming. Details of entities are generally hidden from the user, this process known as 
abstraction. 
One of the important features of Generalization and Specialization, is inheritance, that is, the 
attributes of higher-level entities are inherited by the lower level entities. 
For example, attributes of a person like name, age, and gender can be inherited by lower level 
entities like student and teacher etc. 
Aggregation 
The E-R model cannot express relationships among relationships. 
When would we need such a thing? 
Consider a DB with information about employees who work on a particular project and use a 
number of machines doing that work. We get the E-R diagram shown in Figure below.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 39 
Figure 2.20: E-R diagram with redundant relationships 
Relationship sets work and uses could be combined into a single set. However, they shouldn't be, 
as this would obscure the logical structure of this scheme. 
The solution is to use aggregation. 
 An abstraction through which relationships are treated as higher-level entities. 
 For our example, we treat the relationship set work and the entity sets employee and 
project as a higher-level entity set called work. 
 Figure below shows the E-R diagram with aggregation. 
Figure 2.21: E-R diagram with aggregation 
Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for 
each entity and relationship set as before.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 40 
The table for relationship set uses contains a column for each attribute in the primary key of 
machinery and work. 
Aggregation is an abstraction in which relationship sets are treated as higher level entity sets. 
Here a relationship set is embedded inside an entity set, and these entity sets can participate in 
relationships.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 41 
UNIT 3.1 
RELATIONAL DATABASES 
Relational Databases: Introduction to Relational Databases and Terminology- 
Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key, 
Candidate Key, Primary Key, Foreign Key.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 42 
INTRODUCTION TO RELATIONAL DATABASES 
Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since 
become the dominant database model for commercial applications (in comparison with other 
database models such as hierarchical, network and object models). Today, there are many 
commercial Relational Database Management System (RDBMS), such as Oracle, IBM DB2 and 
Microsoft SQL Server. There are also many free and open-source RDBMS, such as MySQL, 
mSQL (mini-SQL) and the embedded JavaDB. 
A relational database organizes data in tables (or relations). A table is made up of rows and 
columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A 
database table is similar to a spreadsheet. However, the relationships that can be created among 
the tables enable a relational database to efficiently store huge amount of data, and effectively 
retrieve selected data. 
A language called SQL (Structured Query Language) was developed to work with relational 
databases. 
Features of RDBMS 
Features and characteristics of an RDBMS can be best understood by the Codd‘s 12 rules. 
Codd’s12 Rules 
Codd's thirteen rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. 
Codd, a pioneer of the relational model for databases, designed to define what is required from a 
database management system in order for it to be considered relational, i.e., a relational database 
management system (RDBMS). They are sometimes jokingly referred to as "Codd's Twelve 
Commandments". They are as follows: 
Rule 0: The Foundation rule: 
A relational database management system must manage its stored data using only its 
relational capabilities. The system must qualify as relational, as a database, and as a 
management system. For a system to qualify as a relational database management system 
(RDBMS), that system must use its relational facilities (exclusively) to manage the 
database. 
Rule 1: The information rule: 
All information in a relational database (including table and column names) is 
represented in only one way, namely as a value in a table. 
Rule 2: The guaranteed access rule:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 43 
All data must be accessible. It says that every individual scalar value in the database must 
be logically addressable by specifying the name of the containing table, the name of the 
containing column and the primary key value of the containing row. 
Rule 3: Systematic treatment of null values: 
The DBMS must allow each field to remain null (or empty). Specifically, it must support 
a representation of "missing information and inapplicable information" that is systematic, 
distinct from all regular values (for example, "distinct from zero or any other number", in 
the case of numeric values), and independent of data type. It is also implied that such 
representations must be manipulated by the DBMS in a systematic way. 
Rule 4: Active onlinecatalog based on the relational model: 
The system must support an online, inline, relational catalog that is accessible to 
authorized users by means of their regular query language. That is, users must be able to 
access the database's structure (catalog) using the same query language that they use to 
access the database's data. 
Rule 5: The comprehensive data sublanguage rule: 
The system must support at least one relational language that 
1. Has a linear syntax 
2. Can be used both interactively and within application programs, 
3. Supports data definition operations (including view definitions), data 
manipulation operations (update as well as retrieval), security and integrity 
constraints, and transaction management operations (begin, commit, and 
rollback). 
Rule 6: The view updating rule: 
All views that are theoretically updatable must be updatable by the system. 
Rule 7: High-level insert, update, and delete: 
The system must support set-at-a-time insert, update, and delete operators. This means 
that data can be retrieved from a relational database in sets constructed of data from 
multiple rows and/or multiple tables. This rule states that insert, update, and delete 
operations should be supported for any retrievable set rather than just for a single row in a 
single table. 
Rule 8: Physical data independence: 
Changes to the physical level (how the data is stored, whether in arrays or linked lists 
etc.) must not require a change to an application based on the structure. 
Rule 9: Logical data independence: 
Changes to the logical level (tables, columns, rows, and so on) must not require a change 
to an application based on the structure. Logical data independence is more difficult to 
achieve than physical data independence. 
Rule 10: Integrity independence:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 44 
Integrity constraints must be specified separately from application programs and stored in 
the catalog. It must be possible to change such constraints as and when appropriate 
without unnecessarily affecting existing applications. 
Rule 11: Distribution independence: 
The distribution of portions of the database to various locations should be invisible to 
users of the database. Existing applications should continue to operate successfully: 
1. when a distributed version of the DBMS is first introduced; and 
2. when existing distributed data are redistributed around the system. 
Rule 12: The non-subversion rule: 
If the system provides a low-level (record-at-a-time) interface, then that interface cannot 
be used to subvert the system, for example, bypassing a relational security or integrity 
constraint. 
Advantages of RDBMS 
RDBMS offers an extremely structured way of managing data (although a good database design 
is needed) as everything in an RDBMS is represented as values in relations (i.e. tables). Also, 
many obvious advantages are visible within the 13 rules stated by Codd. 
Disadvantages of RDBMS 
RDBMS is very good for related data, but an unorganized and unrelated data creates only chaos 
within RDBMS. That‘s a reason why the emerging trends such as Big Data (where a lot of data 
from various sources is to be analyzed) don‘t welcome RDBMS, but non-relational (or non-SQL 
DBMSs) DBMS for their purpose.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 45 
TERMINOLOGIES: (RELATION, TUPLE, ATTRIBUTE, 
CARDINALITY, DEGREE, DOMAIN) 
Relation: 
Definition- 
A database relation is a predefined row/column format for storing information in a relational 
database. Relations are equivalent to tables. It is also known as table. 
Example- 
Tuple: 
Definition- 
In the context of databases, a tuple is one record (one row). 
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 46 
Attribute: 
Definition- 
In general, an attribute is a characteristic. In a database management system (DBMS), an 
attribute refers to a database component, such a table. It also may refer to a database field. 
Attributes describe the instances in the row of a database. 
Example- 
Degree: 
Definition- 
It is the number of attribute of its relation schema. It is an association among two or more 
entities. 
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 47 
Cardinality: 
Definition- 
In the context of databases, cardinality refers to the uniqueness of data values contained in a 
column. 
It is not common, but cardinality also sometimes refers to the relationships between tables. 
Cardinality between tables can be one-to-one, many-to-one, or many-to-many. 
Example- 
Domain 
Definition- 
In database technology, domain refers to the description of an attribute's allowed values. The 
physical description is a set of values the attribute can have, and the semantic, or logical, 
description is the meaning of the attribute. 
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 48 
KEYS: (SUPER KEYS, CANDIDATE KEY, PRIMARY 
KEY, FOREIGN KEY) 
Definition of a Key- 
Simply consists of one or more attributes that determine other attributes. 
The key is defined as the column or attribute of the database table. For example if a table has id, 
name and address as the column names then each one is known as the key for that table. We can 
also say that the table has 3 keys as id, name and address. The keys are also used to identify each 
record in the database table. 
The following are the various types of keys available in the DBMS system. 
 Super key 
 Candidate key 
 Primary key 
 Foreign key 
Super Key- 
A superkey is a combination of columns that uniquely identifies any row within a relational 
database management system (RDBMS) table. A candidate key is a closely related concept 
where the superkey is reduced to the minimum number of columns required to uniquely identify 
each row. 
For example, imagine a table used to store customer master details that contains columns such 
as: 
customer name 
customer id 
social security number (SSN) 
address 
date of birth 
A certain set of columns may be extracted and guaranteed unique to each customer. Examples of 
superkeys are as follows: 
 Name, SSN, Birthdate 
 ID, Name, SSN 
However, this process may be further reduced. It can be assumed that each customer id is unique 
to each customer. So, the superkey may be reduced to just one field, customer id, which is the 
candidate key. However, to ensure absolute uniqueness, a composite candidate key may be
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 49 
formed by combining customer id with SSN. 
A primary key is a special term for candidate keys designated as unique identifiers for all table 
rows. Until this point, only columns have been considered for suitability and are thus termed 
candidate keys. Once a candidate key is decided, it may be defined as the primary key at the 
point of table creation. 
Candidate key- 
A candidate key is a column, or set of columns, in a table that can uniquely identify any database 
record without referring to any other data. Each table may have one or more candidate keys, but 
one candidate key is special, and it is called the primary key. This is usually the best among the 
candidate keys. 
When a key is composed of more than one column, it is known as a composite key. 
The best way to define candidate keys is with an example. For example, a bank‘s database is 
being designed. To uniquely define each customer‘s account, a combination of the customer‘s ID 
or social security number (SSN) and a sequential number for each of his or her accounts can be 
used. So, Mr. Andrew Smith‘s checking account can be numbered 223344-1, and his savings 
account 223344-2. A candidate key has just been created. 
In this case, the bank‘s database can issue unique account numbers that are guaranteed to prevent 
the problem just highlighted. For good measure, these account numbers can have some built-in 
logic. For example checking accounts can begin with a ‗C,‘ followed by the year and month of 
creation, and within that month, a sequential number. 
Note that it was possible to uniquely identify each account using the aforementioned SSNs and a 
sequential number (assuming no government mess-up, in which the same number is issued to 
two people). So, this is a candidate key that can potentially be used to identify records. However, 
a much better way of doing the same thing has just been demonstrated - creating a candidate key. 
In fact, if the chosen candidate key is so good that it can certainly uniquely identify each and 
every record, then it should be used as the primary key. All databases allow the definition of one, 
and only one, primary key per table. 
Primary key- 
It is a candidate key that is chosen by the database designer to identify entities with in an entity 
set. Primary key is the minimal super keys. In the ER diagram primary key is represented by 
underlining the primary key attribute. Ideally a primary key is composed of only a single 
attribute. But it is possible to have a primary key composed of more than one attribute. 
A primary key is a special relational database table column (or combination of columns) 
designated to uniquely identify all table records.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 50 
A primary key‘s main features are: 
 It must contain a unique value for each row of data. 
 It cannot contain null values. 
A primary key is either an existing table column or a column that is specifically generated by the 
database according to a defined sequence. 
For example, students are routinely assigned unique identification (ID) numbers, uniquely-identifiable 
Social Security numbers. 
For example, a database must hold all of the data stored by a commercial bank. Two of the 
database tables include the CUSTOMER_MASTER, which stores basic and static customer data 
(e.g., name, date of birth, address and Social Security number, etc.) and the 
ACCOUNTS_MASTER, which stores various bank account data (e.g., account creation date, 
account type, withdrawal limits or corresponding account information, etc.). 
To uniquely identify customers, a column or combination of columns is selected to guarantee 
that two customers never have the same unique value. Thus, certain columns are immediately 
eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is 
designated to hold unique and government-assigned Social Security numbers. However, some 
account holders (e.g., children) may not have Social Security numbers, and this column‘s 
candidacy is eliminated. The next logical option is to use a combination of columns such as the 
surname to the date of birth to the email address, resulting in a long and cumbersome primary 
key. 
Foreign Key- 
A foreign key is a column or group of columns in a relational database table that provides a link 
between data in two tables. It acts as a cross-reference between tables because it references the 
primary key of another table, thereby establishing a link between them. 
In complex databases, data in a domain must be added across multiple tables, thus maintaining a 
relationship between them. The concept of referential integrity is derived from foreign key 
theory. 
Foreign keys and their implementation are more complex than primary keys. 
For any column acting as a foreign key, a corresponding value should exist in the link table. 
Special care must be taken while inserting data and removing data from the foreign key column, 
as a careless deletion or insertion might destroy the relationship between the two tables. 
For instance, if there are two tables, customer and order, a relationship can be created between 
them by introducing a foreign key into the order table that refers to the customer ID in the 
customer table. The customer ID column exists in both customer and order tables. The customer 
ID in the order table becomes the foreign key, referring to the primary key in the customer table. 
To insert an entry into the order table, the foreign key constraint must be satisfied.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 51 
Some referential actions associated with a foreign key action include the following: 
 Cascade: When rows in the parent table are deleted, the matching foreign key columns in the 
child table are also deleted, creating a cascading delete. 
 Set Null: When a referenced row in the parent table is deleted or updated, the foreign key values 
in the referencing row are set to null to maintain the referential integrity. 
 Triggers: Referential actions are normally implemented as triggers. In many ways foreign key 
actions are similar to user-defined triggers. To ensure proper execution, ordered referential 
actions are sometimes replaced with their equivalent user-defined triggers. 
 Set Default: This referential action is similar to "set null." The foreign key values in the child 
table are set to the default column value when the referenced row in the parent table is deleted or 
updated. 
 Restrict: This is the normal referential action associated with a foreign key. A value in the parent 
table cannot be deleted or updated as long as it is referred to by a foreign key in another table. 
 No Action: This referential action is similar in function to the "restrict" action except that a no-action 
check is performed only after trying to alter the table.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 52 
UNIT 3.2 
RELATIONAL ALGEBRA 
Relational Algebra: Operations, Select, Project, Union, Difference, Intersection 
Cartesian product, Join, Natural Join.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 53 
INTRODUCTION 
Relational algebra, first described by E.F. Codd while at IBM, is a family of algebra with a 
well-founded semantics used for modeling the data stored in relational databases, and defining 
queries on it. 
In relational algebra the queries are composed using a collection of operators, and each query 
describes a step by step procedure for computing the desired result. 
The queries are specified in operational and procedural manner that‘s why its called the 
procedural language also. 
There are many operations which we include in the relational algebra . 
Each relational query describes a step by step procedure for computing the desired answer 
,based on the order in which operators are applied in the query. 
The procedural nature of the algebra allows us to think of an algebra as a recipe, or a plan for 
evaluating a query, and relational system in fact use algebra expressions to represent query 
evaluation plans. 
Relational algebra expression 
It is an expression which is a composition of the operators and it forms a complex query called 
a relational algebra expression. 
A unary algebra operator applied to a single expression ,and a binary algebra operator applied to 
two expression 
Fundamental operations of Relational algebra: 
 Select 
 Project 
 Union 
 Set different 
 Cartesian product 
 Rename
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 54 
SELECT 
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a 
relation based on a selection condition. 
 The selection condition acts as a filter 
 Keeps only those tuples that satisfy the qualifying condition 
 Tuples satisfying the condition are selected whereas the other tuples are discarded 
(filtered out) 
Examples: 
A. Select the STUDENT tuples whose age is 18 
sigmaage=18 (STUDENT) 
B. Select the STUDENT tuples whose course is bca 
sigmacourse=BCA (STUDENT) 
C. Select the students from the ―student relation instances‖ whose gender is male 
sigmagender=F(STUDENT) 
Student name Age gender course 
Ritika 18 F BCA 
Prerna 19 F Bsc. 
Ankush 20 M BA 
Preeti 18 F Bsc. 
Pragyan 20 M BA 
Ritu 18 F BCA 
Janvi 20 F BCA 
Answer of the first select statement is : 
A. 
Student name Age gender course 
Ritika 18 F BCA 
Preeti 18 F Bsc. 
Ritu 18 F BCA
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 55 
PROJECT 
PROJECT Operation is denoted by p (pi) 
If we are interested in only certain attributes of relation, we use PROJECT. 
This operation keeps certain columns (attributes) from a relation and discards the other columns. 
Example: 
To list all the students name and course only in the student relation model. 
Pistudent_name, course (student) 
(output from the table first) 
Student-name Course 
Ritika BCA 
Prerna Bsc. 
Ankush BA 
Preeti Bsc. 
Pragyan BA 
Ritu BCA 
Janvi BCA
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 56 
UNION 
It is a Binary operation, denoted by sign of union in set theory. The result of R union S, is a 
relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are 
eliminated. 
The two operand relations R and S must be ―type compatible‖ (or UNION compatible), & R and 
S must have same number of attributes. 
Each pair of corresponding attributes must be type compatible (have same or compatible 
domains). Eg. in the bank enterprise we have depositor and borrower almost similar attributes 
and types. 
Customer name Id no. 
RITA 301 
GITA 302 
RAM 303 
(DEPOSITOR‘S RELATIONAL MODEL) 
Customer name Id no. 
Sham 300 
Surbhi 304 
Rita 301 
Ram 303 
(Borrower‘s relational model) 
(Output: a union b) 
Customer_name Id no 
Rita 301 
Gita 302 
Ram 303 
Sham 300 
Surbhi 304
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 57 
DIFFERENCE 
SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – .The result of R – S, is a 
relation that includes all tuples that are in R but not in S. The attribute names in the result will be 
the same as the attribute names in R. The two operand relations R and S must be ―type 
compatible‖ 
Output: a-b 
Customer name Idno 
Gita 302 
The elements of a which are not belongs to b contains only a single result
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 58 
INTERSECTION 
INTERSECTION: The result of the operation R intersection S, is a relation that includes all 
tuples that are in both R and S. 
 The attribute names in the result will be the same as the attribute names in R 
 The two operand relations R and S must be ―type compatible‖
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 59 
CARTESIAN PRODUCT 
, 
The resulting relation state has one tuple for each combination of tuples—one from R and one 
from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have 
nR * nS tuples. 
The two operands do NOT have to be "type compatible‖. 
Example: 
R. 
A 1 
B 2 
D 3 
F 4 
S. 
D 3 
E 4 
Output: R*S 
A 1 D 3 
A 1 E 4 
B 2 D 3 
B 2 E 4 
D 3 D 3 
D 3 E 4 
F 4 D 3 
F 4 E 4
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 60 
JOIN 
, 
It is just a cross product of two relations. 
 Join allow you to evaluate a join condition between the attributes of the relations on 
which the join operations undertaken . 
 It is used to combine related tuples from two relations. 
 Join condition is called theta. 
Notation:- 
R JOINjoin condition S 
Let us take an instance:-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 61 
NATURAL JOIN 
Another variation of JOIN called NATURAL JOIN — denoted by * 
Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such 
joins result in two attributes in the resulting relation having exactly the same value. A 'natural 
join' will remove the duplicate attribute(s). 
 In most systems a natural join will require that the attributes have the same name to 
identify the attribute(s) to be used in the join. This may require a renaming mechanism. 
 If you do use natural joins make sure that the relations do not have two attributes with the 
same name by accident. 
Example: 
The following query results refer to this database state.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 62 
A simple database:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 63 
Example Natural Join Operations on the sample database above:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 64 
SUMMARY OF OPERATIONS
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 65 
UNIT 4 
STRUCTURED QUERY LANGUAGE (SQL) 
& 
NORMALIZATION 
Structured Query Language (SQL): Introduction to SQL, History of SQL, 
Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple 
Queries, Nested Queries, 
Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF, 
BCNF & and Functional Dependency.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 66 
INTRODUCTION TO SQL 
Introduction & Brief History: 
SQL is a special-purpose programming language designed for managing data held in a relational 
database management system (RDBMS). Originally based upon relational algebra and tuple 
relational calculus, SQL consists of a data definition language and a data manipulation language. 
The scope of SQL includes data insert, query, update and delete, schema creation and 
modification, and data access control. 
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as 
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data 
Banks." Despite not entirely adhering to the relational model as described by Codd, it became the 
most widely used database language. 
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the 
International Organization for Standardization (ISO) in 1987. Since then, the standard has been 
revised to include a larger set of features. 
Why SQL? 
 Allows users to access data in relational database management systems. 
 Allows users to describe the data. 
 Allows users to define the data in database and manipulate that data. 
 Allows embedding within other languages using SQL modules, libraries & pre-compilers. 
 Allows users to create and drop databases and tables. 
 Allows users to create view, stored procedure, functions in a database. 
 Allows users to set permissions on tables, procedures and views 
Advantages of SQL: 
 High Speed: SQL Queries can be used to retrieve large amounts of records from a 
database quickly and efficiently. 
 Well Defined Standards Exist: SQL databases use long-established standard, which is 
being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard. 
 No Coding Required: Using standard SQL it is easier to manage database systems 
without having to write substantial amount of code.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 67 
 Emergence of ORDBMS: Previously SQL databases were synonymous with relational 
database. With the emergence of Object Oriented DBMS, object storage capabilities are 
extended to relational databases 
Disadvantages of SQL: 
 Difficulty in Interfacing: Interfacing an SQL database is more complex than adding a few 
lines of code. 
 More Features Implemented in Proprietary way: Although SQL databases conform to 
ANSI &ISO standards, some databases go for proprietary extensions to standard SQL to 
ensure vendor lock-in.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 68 
HISTORY OF SQL 
 In 1970 Edgar F. Codd, member of IBM Lab, published the classic paper, ‘A relational 
model of data large shared data banks‘. 
 With Codd‘s paper ,a great deal of research and experiments started and led to the design 
and prototype implementation of a number of relational languages. 
 One such language was Structured English Query Language (SEQUEL), defined by 
Donald D. Chamberlin and Raymond F. Boyce. 
 The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark 
of the UK-based Hawker Siddeley aircraft company. 
 A revised version of SEQUEL was released in 1976-77 called SEQUEL/2 or SQL 
 In 1978, IBM worked to develop Codd's ideas and released a product named System/R. 
 In 1986IBM developed the first prototype of relational database and standardized by 
ANSI. The first relational database was released by Relational Software and its later 
becoming ORACLE. 
 IN 1986 ANSI and ISO published an SQL standard called ‗SQU-86‘. 
 The next version of standard was SQL-89,SQL-92, followed by SQL-1999,SQL- 
2003,SQL-2006, SQL-2008. 
According to the industry trends , it is obvious that the relational model and SQL Will continue 
to enhance its position in near future
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 69 
CONCEPT BEHIND SQL 
SQL Process 
When you are executing an SQL command for any RDBMS, the system determines the best way 
to carry out your request and SQL engine figures out how to interpret the task. 
There are various components included in the process. These components are:- 
 Query Dispatcher 
 Optimization Engines 
 Classic Query Engine 
 SQL Query Engine 
Classic query engine handles all non-SQL queries but SQL query engine won't handle logical 
files. 
SQL Architecture 
Types of SQL Commands 
The following sections discuss the basic categories of commands used in SQL to perform various 
functions . The main categories are:- 
 DDL (Data Definition Language) 
 DML (Data Manipulation Language) 
 DQL (Data Query Language) 
 DCL (Data Control Language) 
 TCL (Transactional Control Language)
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 70 
DDL COMMANDS 
DDL (Data Definition Language) Commands of SQL allow the Data Definition functions like 
creating, altering and dropping the tables. 
The following are the various DDL Commands, along with their syntax, use and examples: 
#1. CREATE 
USE: creates a new table, view of a table, or other objects in database. 
SYNTAX: 
CREATE TABLE table_name( 
Column_name1 data_type(size), 
Column_name2 data_type(size), 
…. 
); 
EXAMPLE : 
CREATE TABLE Persons 
(PersonIDint, 
LastNamevarchar(255), 
FirstNamevarchar(255), 
Address varchar(255), 
City varchar(255) 
); 
#2. ALTER 
USE : modifies an existing database object such as a table. 
SYNTAX : 
ALTER TABLE table_name 
ADD column_namedatatype; 
or 
ALTER TABLE table_name 
DROP COLUMN column_name; 
or 
ALTER TABLE table_name 
MODIFY COLUMN column_namedatatype; 
EXAMPLE : 
ALTER TABLE Persons 
ADD DateOfBirth date; 
or
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 71 
ALTER TABLE Persons 
DROP COLUMN DateOfBirth; 
or 
ALTER TABLE Persons 
ALTER COLUMN DateOfBirth year; 
#3. DROP 
USE : deletes an entire table, a view of a table, or other object in the database. 
SYNTAX : DROP TABLE table_name; 
EXAMPLE : DROP TABLE Persons; 
#4. TRUNCATE 
USE : remove all records from a table, including all spaces allocated for the 
records are removed; also, reinitializes the primary key. 
SYNTAX : TRUNCATE TABLE table_name; 
EXAMPLE : TRUNCATE TABLE persons; 
#5. COMMENT 
USE : Add comments to the data dictionary.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 72 
DML COMMANDS 
DML (Data Manipulation Language) Commands of SQL allow the Data Manipulation functions 
like inserting, updating and deleting data values in the tables created using DDL Commands. 
The following are the various DML Commands, along with their syntax, use and examples: 
#1. INSERT 
USE : creates a record. 
SYNTAX : 
INSERT INTO table_name 
VALUES (value1,value2,value3,...); 
or 
INSERT INTO table_name (column1,column2,column3,...) 
VALUES (value1,value2,value3,...); 
EXAMPLE : 
INSERT INTO Persons VALUES(1,‘manan’,’07-08-1994’); 
#2. UPDATE 
USE : modifies records. 
SYNTAX : 
UPDATE table_name 
SET column1=value1,column2=value2,... 
WHERE some_column=some_value; 
EXAMPLE : 
UPDATE Students 
SET Fine=0 
WHERE Stu_ID=404; 
#3. DELETE 
USE : delete records (but the structure remain intact). 
SYNTAX : 
DELETE FROM table_name 
WHERE some_column=some_value; 
EXAMPLE : 
DELETE FROM Persons 
WHERE Stu_ID=21;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 73 
#4. CALL 
USE : call a PL/SQL or java subprogram. 
#5. EXPLAIN PLAN 
USE : explain access path to data. 
SYNTAX : 
EXPLAIN PLAN FOR 
SQL_Statement; 
EXAMPLE : 
EXPLAIN PLAN FOR 
SELECT last_name FROM employees; 
#6. LOCK TABLE 
USE : control concurrency. 
SYNTAX : 
LOCK TABLE table_name 
IN EXCLUSIVE MODE 
NOWAIT; 
This locks the table in exclusive mode but does not wait if another user already has locked the table: 
EXAMPLE : 
LOCK TABLE employees 
IN EXCLUSIVE MODE 
NOWAIT;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 74 
DCL COMMANDS 
DCL (Data Control Language) Commands of SQL allow the Data Manipulation functions like 
granting and revoking permissions, committing changes, roll backing, etc. 
The following are the various DCL Commands, along with their syntax, use and examples: 
#1. GRANT 
USE : gives a privilege to user(s). 
SYNTAX : 
GRANT permission [, ...] 
ON [schema_name.]object_name [(column [, ...])] 
TO database_principal[, ...] 
[WITH GRANT OPTION] 
EXAMPLE : 
GRANT SELECT 
ON Invoices 
TO AnneRoberts; 
#2. REVOKE 
USE : takes back privileges/grants from users. 
SYNTAX : 
REVOKE [GRANT OPTION FOR] permission [, ...] 
ON [schema_name.]object_name [(column [, ...])] 
FROM database_principal[, ...] 
[CASCADE] 
EXAMPLE : 
REVOKE SELECT 
ON Invoices 
FROM AnneRoberts; 
#3. COMMIT 
USE : save work done. 
SYNTAX : COMMIT;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 75 
#4. ROLLBACK 
USE : restore database to original sice the last COMMIT. 
SYNTAX : ROLLBACK; 
#5. SAVEPOINT 
USE : identify a point in a transaction in which you can later rollback. 
SYNTAX : 
SAVEPOINT SAVEPOINT_NAME; 
& then, 
ROLLBACK TO SAVEPOINT_NAME; 
RELEASE SAVEPOINT SAVEPOINT_NAME; 
#6. SET TRANSACTION 
USE : set space transaction, change transaction options like what rollback 
segments to use. 
SYNTAX : SET TRANSACTION [ READ WRITE | READ ONLY ];
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 76 
SIMPLE QUERIES & NESTED QUERIES 
A Simple Query is a query that searches using just one parameter. A simple query might use all 
of the fields in a table and search using just one parameter, Or it might use just the necessary 
fields which the information is required, but it will still use just one parameter(search criteria). 
The following are some types of queries: 
• A select query retrieves data from one or more of the tables in your database, or other 
queries there, and displays the results in a datasheet. You can also use a select query to 
group data, and to calculate sums, averages, counts, and other types of totals. 
• A parameter query is a type of select query that prompts you for input before it runs. The 
query then uses your input as criteria that control your results. For example, a typical 
parameter query asks you for starting high and low values, and only returns records that 
fall within those values. 
• A cross-tab query uses row headings and column headings so you can see your data in 
terms of two categories at once. 
• An action query alters your data or your database. For example, you can use an action 
query to create a new table, or add, delete, or change your data. 
A Nested Query or a subquery or inner query is a query in a query. 
A subquery is usually added in the WHERE Clause of sql statement. Most of the time, a 
subquery is used when you know how to search for a value using a SELECT statement, but do 
not know the exact value. 
A subquery is also called an inner query or inner select, while the statement containing a 
subquery is also called an outer query or outer select. 
A query result can be used in a condition of a Where clause. In such case, a query is called a 
subquery and complete SELECT statement is called a nested query. We can also used subquery 
can also be placed within HAVING clause. But subquery cannot be used with ORDERBY 
clause. 
Subqueries are queries nested inside other queries, marked off with parentheses, and sometimes 
referred to as "inner" queries within "outer" queries. Most often, you see subqueries in WHERE 
or HAVING clauses. 
A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT, 
UPDATE, or DELETE statement, or inside another subquery.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 77 
A subquery can appear anywhere an expression can be used, if it returns a single value. 
Statements that include a subquery usually take one of these formats: 
 WHERE expression [NOT] IN (subquery). 
 WHERE expression comparison_operator [ANY | ALL] (subquery). 
 WHERE [NOT] EXISTS (subquery). 
Following are the TYPES of Nested Queries: 
Single - Row Subqueries 
The single-row subquery returns one row. A special case is the scalar subquery, which returns a 
single row with one column. Scalar subqueries are acceptable (and often very useful) in virtually 
any situation where you could use a literal value, a constant, or an expression. The single row 
query uses any operator in the query .i.e. (=, <=, >= <>, <, >). If any of the operators in the 
preceding table are used with a subquery that returns more than one row, the query will fail. 
Multiple-row subqueries 
Multiple-row subqueries return sets of rows. These queries are commonly used to generate result 
sets that will be passed to a DML or SELECT statement for further processing. Both single-row 
and multiple-row subqueries will be evaluated once, before the parent query is run. Since it 
returns multiple values, the query must use the set comparison operators (IN, ALL, ANY). If you 
use a multi row sub query with the equals comparison operators, the database will return an error 
if more than one row is returned. The operators in the following table can use multiple-row 
subqueries: 
Symbol Meaning 
IN equal to any member in a list 
ANY returns rows that match any value on a list 
ALL returns rows that match all the values in a list 
Multiple–Column Subquery 
A subquery that compares more than one column between the parent query and subquery is 
called the multiple column subqueries. In multiple-column subqueries, rows in the subquery 
results are evaluated in the main query in pair-wise comparison. That is, column-to-column 
comparison and row-to-row comparison.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 78 
Correlated Subquery 
A correlated subquery has a more complex method of execution than single- and multiple-row 
subqueries and is potentially much more powerful. If a subquery references columns in the 
parent query, then its result will be dependent on the parent query. This makes it impossible to 
evaluate the subquery before evaluating the parent query. 
Some points to remember about the subquery are: 
• Subqueries are queries nested inside other queries, marked off with parentheses. 
• The result of inner query will pass to outer query for the preparation of final result. 
• ORDER BY clause is not supported for Nested Queries. 
• You cannot use Between Operator. 
• Subqueries will always return only a single value for the outer query. 
• A sub query must be put in the right hand of the comparison operator. 
• A query can contain more than one sub-query.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 79 
NORMALIZATION 
Normalization is the process of efficiently organizing data in a database. There are two goals of 
the normalization process: eliminating redundant data (for example, storing the same data in 
more than one table) and ensuring data dependencies make sense (only storing related data in a 
table). Both of these are worthy goals as they reduce the amount of space a database consumes 
and ensure that data is logically stored. 
Normalization is a process, in which we systematically examine relations for anomalies and, 
when detected, remove those anomalies by splitting up the relation into two new, related 
relations. 
Normalization is an important part of the database development process: Often during 
normalization, the database designers get their first real look into how the data are going to 
interact in the database. 
Finding problems with the database structure at this stage is strongly preferred to finding 
problems further along in the development process because at this point it is fairly easy to cycle 
back to the conceptual model (Entity Relationship model) and make changes. Normalization can 
also be thought of as a trade-off between data redundancy and performance. Normalizing a 
relation reduces data redundancy but introduces the need for joins when all of the data is required 
by an application such as a report query. 
 Problems without Normalization 
Without normalization it becomes difficult to handle and update the database, without facing 
data loss. Insertion, updation, deletion anomalies are very frequent if database is not normalized. 
To understand these anomalies lets us take an example of student table. 
S_id S_name S_address Subject_opted 
401 Adam Noida Bio 
402 Alex Panipat Maths 
403 Stuart Jammu Maths 
404 Adam Noida Physic
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 80 
 Updation Anamoly: 
To update address of the student who occur twice or more than twice in a table, we will have to 
update S_address columns in all the row, else data will become inconsistent. 
 Insertion anamoly: 
Suppose for the new admission we have a S_id(student id), name, address of the student but if 
student is not opted for any subjects yet than we have to inset Null there , leading to insertion 
anamoly. 
 Deletion Anamoly: 
If S_id 401 has only one subject and temporarily he drops it , when we delete that row entire 
student record will be deleted along with it.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 81 
BENEFITS OF NORMALIZATION 
Normalization produces smaller tables with smaller rows: 
 More rows per page (less logical I/O) 
 More rows per I/O (more efficient) 
 More rows fit in cache (less physical I/O) 
The benefits of normalization include: 
 Searching, sorting, and creating indexes is faster, since tables are narrower, and more 
rows fit on a data page. 
 You usually have more tables. 
 You can have more clustered indexes (one per table), so you get more flexibility in tuning 
queries. 
 Index searching is often faster, since indexes tend to be narrower and shorter. 
 More tables allow better use of segments to control physical placement of data. 
 You usually have fewer indexes per table, so data modification commands are faster. 
 Fewer null values and less redundant data, making your database more compact. 
 Triggers execute more quickly if you are not maintaining redundant data. 
 Data modification anomalies are reduced. 
 Normalization is conceptually cleaner and easier to maintain and change as your needs 
change.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 82 
NORMAL FORMS (1NF, 2NF, 3NF, BCNF) 
Relations can fall into one or more categories (or classes) called Normal Forms . 
Normal Form: A class of relations free from a certain set of modification anomalies. 
Normal forms are given names such as: 
1. First Normal Form 
2. Second Normal Form 
3. Third Normal Form 
4. BCNF 
These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF 
The Normalization Process for a given relation consists of: 
 Apply the definition of each normal form (starting with 1NF). 
 If a relation fails to meet the definition of a normal form, change the relation (most often by 
splitting the relation into two new relations) until it meets the definition. 
 Re-test the modified/new relations to ensure they meet the definitions of each normal form. 
First Normal Form (1NF) 
 A relation is in first normal form if it meets the definition of a relation: 
1. Each attribute (column) value must be a single value only. 
2. All values for a given attribute (column) must be of the same type. 
3. Each attribute (column) name must be unique. 
4. The order of attributes (columns) is insignificant 
5. No two tuples (rows) in a relation can be identical. 
6. The order of the tuples (rows) is insignificant 
Each table should be organized into row and each row should have a primary key that 
distinguishes it as unique. The primary key is usually a single column but sometimes more than 
one column can be combined to create a single primary key. 
For example consider a table is not in first normal form
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 83 
In First Normal Form any row must not have a column in which more than one value is saves, 
like separated with commas rather that, we must separated such data into multiple rows. 
Table in first Normal Form 
Student Age Subject 
Adam 15 Biology 
Adam 15 Maths 
Alex 14 Maths 
Stuart 17 Maths 
Using First Normal Form data redundancy increases as there will be many columns with the 
same data in multiple rows but each row as a whole will be unique. 
Second Normal Form (2NF) 
 A relation is in second normal form (2NF) if all of its non-key attributes are dependent on 
all of the key. 
 Another way to say this: A relation is in second normal form if it is free from partial-key 
dependencies 
 Relations that have a single attribute for a key are automatically in 2NF. 
 This is one reason why we often use artificial identifiers (non-composite keys) as keys 
As per the second normal form there must not be any partial dependency of any colomn on 
primary key. It means that for a table that has concatenated primary key, each colomn in the 
table that is not part of primary key must depend upon the entire concatenated key for its 
existence. If any column depends only on one part of the concatenates key, then the table fails 
second normal form 
In the example of First Normal Form, there are two rows for Adam, to include multiple subjects 
that he has opted for. While this is searchable, and follows First Normal Form, it is an inefficient 
use of space. Also in the above table in first normal form while the candidate key is {Student, 
Student Age Subject 
Adam 15 Biology, Maths 
Alex 14 Maths 
Stuart 17 Maths
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 84 
subject} , Age of student only depends on student columns which is incorrect as per second 
normal form. To achieve second normal form , it would be helpful to split out the subject into an 
independent table, and match then up using the student names as foreign keys. 
New student table following second normal form will be: 
Student Age 
Adam 15 
Alex 14 
Stuart 17 
In student table the candidate key will be student column, because all other column i.e. Age 
depend on it 
New subject table introduced for second normal form will be: 
Student Subject 
Adam Biology 
Adam Maths 
Alex Maths 
Stuart Maths 
In subject Table the candidate key will be {subject, Student} column. Now both the above table 
qualifies for second normal form and will never suffer updated anomalies. 
Third Normal Form (3NF) 
 A relation is in third normal form (3NF) if it is in second normal form and it contains no 
transitive dependencies. 
 Consider relation R containing attributes A, B and C. R(A, B, C) 
 If A → B and B → C then A → C 
 Transitive Dependency: Three attributes with the above dependencies 
Third normal forms apply that every non prime attribute of table must be dependent on primary 
key. The transitive function dependency should be removed from the table. The table must be in 
Second Normal Form. For example the table with the following field 
Student_Detail table:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 85 
Student_id Student_name DOB Street City State Zip 
In this table student_id is the primary key, but street, city, state depends upon zip. The 
dependency between zip and other field is transitive dependency. Hence to apply third normal 
form we need to move the street, city, state to the new table, with zip as primary key. 
New Student_Detail Table: 
Student_id Student_name DOB Zip 
Address_Table: 
Zip Street City State 
The advantage of removing transitive dependency is: 
1. Amount of data duplication is reduce. 
2. Data integrity achieved. 
Boyce-Codd Normal Form (BCNF) 
 Boyce-Codd normal form (BCNF) 
 A relation is in BCNF, if and only if, every determinant is a candidate key. 
 The difference between 3NF and BCNF is that for a functional dependency A->B, 
3NF allows this dependency in a relation if B is a primary-key attribute and A is not a 
candidate key, 
 Where as BCNF insists that for this dependency to remain in a relation, A must be a 
candidate key. 
Boyce and codd normal form is the high version of the Third Normal Form. This form deal with 
certain type of anamoly that is not held by third normal form. A third Normal form table which 
does not have any multiple overlapping candidate key is said to be in BCNF. 
Client Interview 
ClientNo interviewDate InterviewTime StaffNo roomNo 
CR76 13/5/02 10:30 SG5 G101 
CR76 13/5/02 12:00 SG5 G101 
CR74 13/5/02 12:00 SG37 G102
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 86 
CR56 1/7/02 10:30 SG5 G102 
1. FD1: clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary Key) 
2. FD2: staffNo, interviewDate, interviewTime- > clientNo (Candidate key) 
3. FD3: roomNo, interviewDate, interviewTime -> clientNo, staffNo (Candidate key) 
4. FD4: staffNo, interviewDate- > roomNo (not a candidate key) 
 As a consequence the ClientInterview relation may suffer from update anomalies. 
 For example, two tuples have to be updated if the roomNo need be changed for staffNo 
SG5 on the 13-May-02. 
 To transform the ClientInterview relation to BCNF, we must remove the violating 
functional dependency by creating two new relations called Interview and StaffRoom as 
shown below: 
1. Interview (clientNo, interviewDate, interviewTime, staffNo) 
2. StaffRoom (staffNo, interviewDate, roomNo) 
Interview 
ClientNo InterviewDate InterviewTime StaffNo 
CR76 13/5/02 10:30 SG5 
CR76 13/5/02 12:00 SG5 
CR74 13/5/02 12:00 SG37 
CR56 1/7/02 10:30 SG5 
StaffRoom 
staffNo InterviewDate RoomNo 
SG5 13/5/02 G101 
SG37 13/5/02 G102 
SG5 1/7/02 G102
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 87 
FUNCTIONAL DEPENDENCY 
Functional dependency is a relationship that exists when one attribute uniquely determines 
another attribute. 
A functional dependency occurs when one attribute in a relation uniquely determines another 
attribute. This can be written A -> B which would be the same as stating "B is functionally 
dependent upon A" 
Example: If R is a relation with attributes X and Y, a functional dependency between the 
attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a 
determinant set and Y is a dependent attribute. Each value of X is associated precisely with one 
Y value. 
Functional dependency in a database serves as a constraint between two sets of attributes. 
Defining functional dependency is an important part of relational database design and contributes 
to aspect normalization. 
Consider an Example: 
REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where: 
 Student#-Student Number 
 Course#-Course Number 
 CourseName -CourseName 
 IName- Name of the instructor who delivered the course 
 Room#-Room number which is assigned to respective instructor 
 Marks- Scored in Course Course# by student Student # 
 Grade –Obtained by student Student# in course Course #
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 88 
 Student#,Course# together (called composite attribute) defines EXACTLY 
ONE value of marks .This can be symbolically represented as 
Student#Course# -> Marks 
REMARK: This type of dependency is called functional dependency. In above example Marks 
is functionally dependent on Student#Course#. 
Other function dependency in the bove example are 
• Course# -> CourseName 
• Course#-> IName(Assuming one course is taught by one and only one instructor ) 
• IName -> Room# (Assuming each instructor has his /her own and non-shared room) 
• Marks ->Grade 
• Formally we can define functional dependency as: In a given relation R, X and Y are 
attributes. Attribute Y is functional dependent on attribute X if each value of X 
determines exactly one value of Y. This is represented as X->Y, however X may be 
composite in nature.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 89 
UNIT 5 
RELATIONAL DATABASE DESIGN 
Relational Database Design: Introduction to Relational Database Design, DBMS 
v/s RDBMS. Integrity rule, Concept of Concurrency Control and Database 
Security.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 90 
INTRODUCTION TO RELATIONAL DATABASE 
DESIGN 
Just as a house without a foundation will fall over, a database with poorly designed tables and 
relationships will fail to meet the needs of its users. And hence, the need of a sound relational 
database design originates. 
The History of Relational Database Design 
Dr. E. F. Codd first introduced formal relational database design in 1969 while he was at IBM. 
Relational theory, which is based on set theory, applies to both databases and database 
applications. Codd developed 12 rules that determine how well an application and its data adhere 
to the relational model. Since Codd first conceived these 12 rules, the number of rules has 
expanded into the hundreds. 
Goals of Relational Database Design 
The number one goal of relational database design is to, as closely as possible, develop a 
database that models some real-world system. This involves breaking the real-world system into 
tables and fields and determining how the tables relate to each other. Although on the surface 
this task might appear to be trivial, it can be an extremely cumbersome process to translate a 
real-world system into tables and fields. 
A properly designed database has many benefits. The processes of adding, editing, deleting, and 
retrieving table data are greatly facilitated by a properly designed database. In addition, reports 
are easier to build. Most importantly, the database becomes easy to modify and maintain. 
Rules of Relational Database Design 
To adhere to the relational model, tables must follow certain rules. These rules determine what is 
stored in tables and how the tables are related. 
1. The Rules of Tables 
Each table in a system must store data about a single entity. An entity usually represents a real-life 
object or event. Examples of objects are customers, employees, and inventory items. 
Examples of events include orders, appointments, and doctor visits. 
2. The Rules of Uniqueness and Keys 
Tables are composed of rows and columns. To adhere to the relational model, each table must 
contain a unique identifier. Without a unique identifier, it becomes programmatically impossible 
to uniquely address a row. You guarantee uniqueness in a table by designating a primary key, 
which is a single column or a set of columns that uniquely identifies a row in a table. 
Each column or set of columns in a table that contains unique values is considered a candidate 
key. One candidate key becomes the primary key. The remaining candidate keys become
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 91 
alternate keys. A primary key made up of one column is considered a simple key. A primary key 
comprising multiple columns is considered a composite key.It is generally a good idea to pick a 
primary key that is 
 Minimal (has as few columns as possible) 
 Stable (rarely changes) 
 Simple (is familiar to the user) 
Following these rules greatly improves the performance and maintainability of your database 
application, particularly if you are dealing with large volumes of data. 
3. The Rules of Foreign Keys and Domains 
A foreign key in one table is the field that relates to the primary key in a second table. For 
example, the CustomerID is the primary key in the Customers table. It is the foreign key in 
the Orders table.A domain is a pool of values from which columns are drawn. A simple 
example of a domain is the specific data range of employee hire dates. In the case of the 
Orders table, the domain of the CustomerID column is the range of values for the 
CustomerID in the Customers table. 
4. Normalization and Normal Forms 
Some of the most difficult decisions that you face as a developer are what tables to create and 
what fields to place in each table, as well as how to relate the tables that you create. 
Normalization is the process of applying a series of rules to ensure that your database achieves 
optimal structure. Normal forms are a progression of these rules. Each successive normal form 
achieves a better database design than the previous form did. Although there are several levels of 
normal forms, it is generally sufficient to apply only the first three levels of normal forms. 
5. Denormalization—Purposely Violating the Rules 
Although the developer's goal is normalization, often it makes sense to deviate from normal 
forms. We refer to this process as denormalization. The primary reason for applying 
denormalization is to enhance performance.If you decide to denormalize, document your 
decision. Make sure that you make the necessary application adjustments to ensure that you 
properly maintain the denormalized fields. Finally, test to ensure that the denormalization 
process actually improves performance. 
6. Integrity Rules 
Although integrity rules are not part of normal forms, they are definitely part of the database 
design process. Integrity rules are broken into two categories. They include overall integrity rules 
and database-specific integrity rules. 
7. Database-Specific Rules 
The other set of rules applied to a database are not applicable to all databases but are, instead, 
dictated by business rules that apply to a specific application. Database-specific rules are as 
important as overall integrity rules. They ensure that only valid data is entered into a database. 
An example of a database-specific integrity rule is that the delivery date for an order must fall 
after the order date.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 92 
(Also, see Codd’s 12 rules) 
Examining the Types of Relationships 
Three types of relationships can exist between tables in a database: one-to-many, one-to-one, and 
many-to-many. Setting up the proper type of relationship between two tables in your database is 
imperative. The right type of relationship between two tables ensures 
 Data integrity 
 Optimal performance 
 Ease of use in designing system objects 
The reasons behind these benefits are covered throughout this chapter. Before you can 
understand the benefits of relationships, though, you must understand the types of relationships 
available. 
One-to-Many 
A one-to-many relationship is by far the most common type of relationship. In a one-to-many 
relationship, a record in one table can have many related records in another table. A common 
example is a relationship set up between a Customers table and an Orders table. For each 
customer in the Customers table, you want to have more than one order in the Orders table. 
On the other hand, each order in the Orders table can belong to only one customer. The 
Customers table is on the one side of the relationship, and the Orders table is on the many 
side. For you to implement this relationship, the field joining the two tables on the one side of the 
relationship must be unique. 
One-to-One 
In a one-to-one relationship, each record in the table on the one side of the relationship can have 
only one matching record in the table on the many side of the relationship. This relationship is 
not common and is used only in special circumstances. Usually, if you have set up a one-to-one 
relationship, you should have combined the fields from both tables into one table. 
Many-to-Many 
In a many-to-many relationship, records in both tables have matching records in the other table. 
An example is an Orders table and a Products table. Each order probably will contain 
multiple products, and each product is found on many different orders. The solution is to create a 
third table called OrderDetails. You relate the OrderDetails table to the Orders table 
in a one-to-many relationship based on the OrderID field. You relate it to the Products table 
in a one-to-many relationship based on the ProductID field.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 93 
DBMS VS RDBMS 
History of DBMS and RDBMS 
Database management systems first appeared on the scene in 1960 as computers began to grow 
in power and speed. In the middle of 1960, there were several commercial applications in the 
market that were capable of producing ―navigational‖ databases. These navigational databases 
maintained records that could only be processed sequentially, which required a lot of computer 
resources and time. 
Relational database management systems were first suggested by Edgar Codd in the 1970s. 
Because navigational databases could not be ―searched‖, Edgar Codd suggested another model 
that could be followed to construct a database. This was the relational model that allowed users 
to ―search‖ it for data. It included the integration of the navigational model, along with a tabular 
and hierarchical model. 
Difference between DBMS and RDBMS:- 
DBMS: 
A DBMS is a storage area that persist the data in files. To perform the database 
operations, the file should be in use. 
Relationship can be established between 2 files. 
There are limitations to store records in a single database file depending upon the 
database manager used. 
DBMS allows the relations to be established between 2 files. 
Data is stored in flat files with metadata. 
DBMS does not support client / server architecture. 
DBMS does not follow normalization. Only single user can access the data. 
DBMS does not impose integrity constraints. 
ACID properties of database must be implemented by the user or the developer. 
DBMS is used for simpler applications. 
Small sets of data can be managed by DBMS. 
RDBMS:- 
RDBMS stores the data in tabular form. 
It has additional condition for supporting tabular structure or data that enforces 
relationships among tables. 
RDBMS supports client/server architecture. 
RDBMS follows normalization. 
RDBMS allows simultaneous access of users to data tables. 
RDBMS imposes integrity constraints. 
ACID properties of the database are defined in the integrity constraints. 
RDBMS is used for more complex applications. 
RDBMS solution is required by large sets of data.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 94 
INTEGRITY RULE 
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its 
entire life-cycle and is a critical aspect to the design, implementation and usage of any system 
which stores, processes, or retrieves data. 
Data integrity is the opposite of data corruption, which is a form of data loss. The overall intent 
of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a 
database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the 
data is the same as it was when it was originally recorded. In short, data integrity aims to prevent 
unintentional changes to information. Data integrity is not to be confused with data security, the 
discipline of protecting data from unauthorized parties. 
Any unintended changes to data as the result of a storage, retrieval or processing operation, 
including malicious intent, unexpected hardware failure, and human error, is failure of data 
integrity. If the changes are the result of unauthorized access, it may also be a failure of data 
security. 
TYPES OF INTEGRITY RULES/CONSTRAINTS 
Data integrity is normally enforced in a database system by a series of integrity constraints or 
rules. Three types of integrity constraints are an inherent part of the relational data model: entity 
integrity, referential integrity and domain integrity: 
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule 
which states that every table must have a primary key and that the column or columns 
chosen to be the primary key should be unique and not null. 
Referential integrity concerns the concept of a foreign key. The referential integrity rule 
states that any foreign-key value can only be in one of two states. The usual state of 
affairs is that the foreign key value refers to a primary key value of some table in the 
database. Occasionally, and this will depend on the rules of the data owner, a foreign-key 
value can be null. In this case we are explicitly saying that either there is no relationship 
between the objects represented in the database or that this relationship is unknown. 
Domain integrity specifies that all columns in relational database must be declared upon 
a defined domain. The primary unit of data in the relational data model is the data item. 
Such data items are said to be non-decomposable or atomic. A domain is a set of values 
of the same type. Domains are therefore pools of values from which actual values 
appearing in the columns of a table are drawn. 
User-defined integrity refers to a set of rules specified by a user, which do not belong to 
the entity, domain and referential integrity categories.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 95 
If a database supports these features it is the responsibility of the database to insure data integrity 
as well as the consistency model for the data storage and retrieval. If a database does not support 
these features it is the responsibility of the applications to ensure data integrity while the 
database supports the consistency model for the data storage and retrieval. 
Having a single, well-controlled, and well-defined data-integrity system increases 
stability (one centralized system performs all data integrity operations) 
performance (all data integrity operations are performed in the same tier as the 
consistency model) 
re-usability (all applications benefit0 from a single centralized data integrity system) 
Maintainability (one centralized system for all data integrity administration). 
Many companies, and indeed many database systems themselves, offer products and services to 
migrate out-dated and legacy systems to modern databases to provide these data-integrity 
features. This offers organizations substantial savings in time, money, and resources because 
they do not have to develop per-application data-integrity systems that must be re-factored each 
time business requirements change. 
Example 
An example of a data-integrity mechanism is the parent-and-child relationship of related records. 
If a parent record owns one or more related child records all of the referential integrity processes 
are handled by the database itself, which automatically insures the accuracy and integrity of the 
data so that no child record can exist without a parent (also called being orphaned) and that no 
parent loses their child records. It also ensures that no parent record can be deleted while the 
parent record owns any child records. All of this is handled at the database level and does not 
require coding integrity checks into each applications.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 96 
CONCEPT OF CONCURRENCY CONTROL 
Definition 
Concurrency control is a database management systems (DBMS) concept that is used to address 
conflicts with the simultaneous accessing or altering of data that can occur with a multi-user 
system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous 
transactions while preserving data integrity. The Concurrency is about to control the multi-user 
access of Database. 
Illustrative Example 
To illustrate the concept of concurrency control, consider two travelers who go to electronic 
kiosks at the same time to purchase a train ticket to the same destination on the same train. 
There's only one seat left in the coach, but without concurrency control, it's possible that both 
travelers will end up purchasing a ticket for that one seat. However, with concurrency control, 
the database wouldn't allow this to happen. Both travellers would still be able to access the train 
seating database, but concurrency control would preserve data accuracy and allow only one 
traveler to purchase the seat. 
This example also illustrates the importance of addressing this issue in a multi-user database. 
Obviously, one could quickly run into problems with the inaccurate data that can result from 
several transactions occurring simultaneously and writing over each other. The following section 
provides strategies for implementing concurrency control. 
Database transaction and the ACID rules 
The concept of a database transaction (or atomic transaction) has evolved in order to enable 
both a well understood database system behavior in a faulty environment where crashes can 
happen any time, and recovery from a crash to a well understood database state. A database 
transaction is a unit of work, typically encapsulating a number of operations over a database 
(e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in 
database and also other systems. Each transaction has well defined boundaries in terms of which 
program/code executions are included in that transaction (determined by the transaction's 
programmer via special transaction commands). Every database transaction obeys the following 
rules (by support in the database system; i.e., a database system is designed to guarantee them for 
the transactions it runs): 
 Atomicity - Either the effects of all or none of its operations remain ("all or nothing" 
semantics) when a transaction is completed (committed or aborted respectively). In other 
words, to the outside world a committed transaction appears (by its effects on the 
database) to be indivisible (atomic), and an aborted transaction does not affect the 
database at all, as if never happened.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 97 
 Consistency - Every transaction must leave the database in a consistent (correct) state, 
i.e., maintain the predetermined integrity rules of the database (constraints upon and 
among the database's objects). A transaction must transform a database from one 
consistent state to another consistent state (however, it is the responsibility of the 
transaction's programmer to make sure that the transaction itself is correct, i.e., performs 
correctly what it intends to perform (from the application's point of view) while the 
predefined integrity rules are enforced by the DBMS). Thus since a database can be 
normally changed only by transactions, all the database's states are consistent. 
 Isolation - Transactions cannot interfere with each other (as an end result of their 
executions). Moreover, usually (depending on concurrency control method) the effects of 
an incomplete transaction are not even visible to another transaction. Providing isolation 
is the main goal of concurrency control. 
 Durability - Effects of successful (committed) transactions must persist through crashes 
(typically by recording the transaction's effects and its commit event in a non-volatile 
memory). 
Why is concurrency control needed? 
If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction 
concurrency exists. However, if concurrent transactions with interleaving operations are allowed 
in an uncontrolled manner, some unexpected, undesirable result may occur, such as: 
1. The lost update problem: A second transaction writes a second value of a data-item 
(datum) on top of a first value written by a first concurrent transaction, and the first value 
is lost to other transactions running concurrently which need, by their precedence, to read 
the first value. The transactions that have read the wrong value end with incorrect results. 
2. The dirty read problem: Transactions read a value written by a transaction that has been 
later aborted. This value disappears from the database upon abort, and should not have 
been read by any transaction ("dirty read"). The reading transactions end with incorrect 
results. 
3. The incorrect summary problem: While one transaction takes a summary over the values 
of all the instances of a repeated data-item, a second transaction updates some instances 
of that data-item. The resulting summary does not reflect a correct result for any (usually 
needed for correctness) precedence order between the two transactions (if one is executed 
before the other), but rather some random result, depending on the timing of the updates, 
and whether certain update results have been included in the summary or not. 
Most high-performance transactional systems need to run transactions concurrently to meet their 
performance requirements. Thus, without concurrency control such systems can neither provide 
correct results nor maintain their databases consistent. 
Concurrency Control Locking Strategies 
Pessimistic Locking: This concurrency control strategy involves keeping an entity in a database 
locked the entire time it exists in the database's memory. This limits or prevents users from
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 98 
altering the data entity that is locked. There are two types of locks that fall under the category of 
pessimistic locking: write lock and read lock. 
With write lock, everyone but the holder of the lock is prevented from reading, updating, or 
deleting the entity. With read lock, other users can read the entity, but no one except for the lock 
holder can update or delete it. 
Optimistic Locking: This strategy can be used when instances of simultaneous transactions, or 
collisions, are expected to be infrequent. In contrast with pessimistic locking, optimistic locking 
doesn't try to prevent the collisions from occurring. Instead, it aims to detect these collisions and 
resolve them on the chance occasions when they occur. 
Pessimistic locking provides a guarantee that database changes are made safely. However, it 
becomes less viable as the number of simultaneous users or the number of entities involved in a 
transaction increase because the potential for having to wait for a lock to release will increase. 
Optimistic locking can alleviate the problem of waiting for locks to release, but then users have 
the potential to experience collisions when attempting to update the database. 
Lock Problems: 
Deadlock: 
When dealing with locks two problems can arise, the first of which being deadlock. Deadlock 
refers to a particular situation where two or more processes are each waiting for another to 
release a resource, or more than two processes are waiting for resources in a circular chain. 
Deadlock is a common problem in multiprocessing where many processes share a specific type 
of mutually exclusive resource. Some computers, usually those intended for the time-sharing 
and/or real-time markets, are often equipped with a hardware lock, or hard lock, which 
guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly 
disconcerting because there is no general solution to avoid them. 
A fitting analogy of the deadlock problem could be a situation like when you go to unlock your 
car door and your passenger pulls the handle at the exact same time, leaving the door still locked. 
If you have ever been in a situation where the passenger is impatient and keeps trying to open the 
door, it can be very frustrating. Basically you can get stuck in an endless cycle, and since both 
actions cannot be satisfied, deadlock occurs. 
Livelock: 
Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except that 
the states of the processes involved constantly change with regard to one another wile never 
progressing. The general definition only states that a specific process is not progressing. For 
example, the system keeps selecting the same transaction for rollback causing the transaction to 
never finish executing. Another livelock situation can come about when the system is deciding 
which transaction gets a lock and which waits in a conflict situation.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 99 
An illustration of livelock occurs when numerous people arrive at a four way stop, and are not 
quite sure who should proceed next. If no one makes a solid decision to go, and all the cars just 
keep creeping into the intersection afraid that someone else will possibly hit them, then a kind of 
livelock can happen. 
Basic Timestamping: 
Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method 
doesn‘t use locks to control concurrency, so it is impossible for deadlock to occur. According to 
this method a unique timestamp is assigned to each transaction, usually showing when it was 
started. This effectively allows an age to be assigned to transactions and an order to be assigned. 
Data items have both a read-timestamp and a write-timestamp. These timestamps are updated 
each time the data item is read or updated respectively. 
Problems arise in this system when a transaction tries to read a data item which has been written 
by a younger transaction. This is called a late read. This means that the data item has changed 
since the initial transaction start time and the solution is to roll back the timestamp and acquire a 
new one. Another problem occurs when a transaction tries to write a data item which has been 
read by a younger transaction. This is called a late write. This means that the data item has been 
read by another transaction since the start time of the transaction that is altering it. The solution 
for this problem is the same as for the late read problem. The timestamp must be rolled back and 
a new one acquired. 
Adhering to the rules of the basic timestamping process allows the transactions to be serialized 
and a chronological schedule of transactions can then be created. Timestamping may not be 
practical in the case of larger databases with high levels of transactions. A large amount of 
storage space would have to be dedicated to storing the timestamps in these cases.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 100 
DATABASE SECURITY 
"Secret Passwords, iron bolts, gated driveways, access cards, etc. - layers of physical security in 
the real world are also found in the database world as well …. Creating and enforcing security 
procedures helps to protect what is rapidly becoming the most important corporate asset: 
DATA." 
Database security concerns the use of a broad range of information security controls to protect 
databases (potentially including the data, the database applications or stored functions, the 
database systems, the database servers and the associated network links) against compromises of 
their confidentiality, integrity and availability. The three main objectives of database security 
are: 
1. Secrecy / confidentiality: Information is not disclosed to unauthorized users. Private 
remains private. 
2. Integrity: Ensuring data are accurate; and data must be protected from unauthorized 
modification/destruction (only authorized users can modify data) 
3. Availability: Ensuring data is accessible whenever needed by the organization. 
(Authorized users should not be denied access) 
In order to achieve these objectives, following are employed: 
1. A clear and consistent security policy. (about security measures to be enforced; What 
data is to be protected, and which users get access to which portion of data) 
2. Security mechanisms of underlying DBMS & OS; also external mechanisms, as securing 
access to buildings. i.e. Security measures at various levels, must be taken to ensure 
proper security. 
Authorization and Authentication are the two A‘s of security, that every secure system must be 
good at. 
The Sources of External Security Threats are: 
1. Physical threats: This includes physical threat to the Hardware of the database system. 
And they may occur due to danger in: buildings; network; due to human errors (eg. 
privileged accounts left logged in) 
2. Hackers & Crackers: 
white hat hackers: "good guys", hired to fix/test systems; don't release information 
about system vulnerability to public until fixed. 
Script kiddies: hacker "wannabes"; little programming skills and rely on tools written by 
others. 
black hat hackers: hackers who are motivated by greed or a desire to cause harm; most 
dangerous; very knowledgeable and their activities are often undetectable.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 101 
Cyber-terrorists: hackers motivated by political, religious or philosophical agenda. They 
may try to deface websites that support opposing positions. Current global climate fears 
that they may even attempt to disable networks that handles utilities such as nuclear 
plants and water system. 
3. Types of Attacks: 
Denial of Service (DoS) attack: A denial-of-service (DoS) or distributed denial-of-service 
(DDoS) attack is an attempt to make a machine or network resource unavailable to its intended 
users. Although the means to carry out, the motives for, and targets of a DoS attack vary, it 
generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host 
connected to the Internet. As clarification, distributed denial-of-service attacks are sent by two or 
more persons, or bots, and denial-of-service attacks are sent by one person or system. 
Buffer Overflow: There is a loophole in the programming error in system. A very popular 
example: SQL injection. 
A buffer overflow occurs when data written to a buffer also corrupts data values in memory 
addresses adjacent to the destination buffer due to insufficient bounds checking. This can occur 
when copying data from one buffer to another without first checking that the data fits within the 
destination buffer. 
Malware: Malware, short for malicious software, is any software used to disrupt computer 
operation, gather sensitive information, or gain access to private computer systems. It can appear 
in the form of executable code, scripts, active content, and other software. 'Malware' is a general 
term used to refer to a variety of forms of hostile or intrusive software. 
Social Engineering: The psychological manipulation of people into performing actions or 
divulging confidential information. A type of confidence trick for the purpose of information 
gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of 
many steps in a more complex fraud scheme. 
Brute forces: A cryptanalytic attack that can, in theory, be used against any encrypted data. 
(except for data encrypted in an information-theoretically secure manner). Such an attack might 
be used when it is not possible to take advantage of other weaknesses in an encryption system (if 
any exist) that would make the task easier. It consists of systematically checking all possible 
keys or passwords until the correct one is found. In the worst case, this would involve traversing 
the entire search space. 
Now, as we have seen the sources of external security threats, let us study the Sources of 
Internal Security Threats. 
There may be employees threats: either intentional or accidental. 
Intentional Employee threat: 
 personnel who employ hacking techniques to upgrade their legitimate access to root or 
administrator.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 102 
 personnel who take advantage of legitimate access to divulge trade secrets, steal money, 
personal / political gain. 
 family members of employees who are visiting office & have been given access. 
 personnel who break into secure machine room to gain physical access to mainfram& 
other large-system consoles. 
 former employees, seeking revenge. 
Unintentional / Accidental Employee threat: 
 becoming victim to social engineering attack (unknowingly helping a hacker) 
 unknowingly revealing confidential information 
 physical damage (accidental) leading to data loss 
 inaccurate / improper usage 
Other threats: 
 electrical power fluctuations 
 hardware failures 
 Natural disasters: fires, flood. 
Now, knowing the sources of both external and internal source of security threats, let us move to 
the solutions. They are also both external and internal. 
Some External solutions to the security issues are: 
1. Securing the perimeter: Firewall 
2. Handling Malware 
3. fixing buffer overflows 
4. Physical server security: 
security cameras; smart locks; removal of signs from machine/server room or hallways 
(so that no one can locate sensitive hardware rooms easily); privileged accounts must 
never be left logged in. 
5. User Authentication: 
Positive User Identification requires 3 things: 
a) something the user knows: user IDs and passwords 
b) something the user has: physical login devices, eg. for $5, PayPal sends small device 
that generates 1 time password. 
c) something the user is: biometrics 
6. VPNs: 
provides encryption for data transmissions over the Internet; uses IPSec protocol. 
7. Combating Social Engineering
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 103 
8. Handling other employee threats: 
policies; employee training sessions; when employee is fired, its account is properly 
erased, etc. 
Some Internal solutions to the security threats are: 
1. Internal database User-IDs & passwords 
2. To provides control of access rights to tables, views and their components: 
Types of Access Rights: The typical SQL-based DBMS provides 6 types of access 
rights: SELECT: to retrieve, INSERT, UPDATE, DELETE, REFERENCES: to reference 
table via a foreign key, and ALL PRIVILEGES. 
3. Using an authorization matrix: a set of roles that are required for a business user. It is a 
normal spreadsheet document with list of roles. Further, it also contains the list of 
transaction in every role. When a new user joins the organization, he can find out the 
roles for which access is required based on the FUG (Functional User Group) in the 
authorization matrix. 
4. Database Implementations (Data dictionary): A data dictionary is one tool organizations 
can use to help ensure data accuracy. 
GRANTING & REVOKING ACCESS RIGHTS: 
Granting and revoking access-rights is the one of the most visible security feature of DBMS. 
Using corresponding commands permissions to various objects of the database can be granted or 
revoked. The following SQL commands can be used to grant and revoke access rights of a table 
or a view to user(s). 
Granting Rights: 
Syntax: 
GRANT type_of_rights ON table_or_view_name TO user_id 
Examples: 
 GRANT SELECT ON order_summary TO acctg_mgr 
 GRANT SELECT ON order_summary TO acctg_mgr WITH GRANT 
OPTION 
(now user can also grant / pass rights to others) 
 GRANT SELECT, UPDATE (retail_price, distributor_name) ON 
item TO intern1, intern2, intern3 
 GRANT SELECT ON order_summary TO PUBLIC
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 104 
Revoking Rights: 
Syntax: 
REVOKE type_of_rights ON table_or_view_name FROM user_id 
Examples: 
the examples are similar to those of Granting rights 
if rights have been passed by the user, i.e. the user has already granted rights to others, then: 
 REVOKE SELECT ON order_summary FROM acctg_mgr RESTRICT 
(if rights would have been passed, it will not revoke) 
 REVOKE SELECT ON order_summary FROM acctg_mgr CASCADE 
(if rights would have been passed, it will revoke rights from all users to those rights have 
been passed)
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 105 
End.

More Related Content

PDF
Introduction to Database Management System
PDF
Micro project co 3 i 22 question
PDF
Unit 2 rdbms study_material
PPTX
Database Management System
PPTX
data manipulation language
PPTX
Basic Concept Of Database Management System (DBMS) [Presentation Slide]
PDF
Database Lecture Notes
PDF
Unit1 rdbms study_materials
Introduction to Database Management System
Micro project co 3 i 22 question
Unit 2 rdbms study_material
Database Management System
data manipulation language
Basic Concept Of Database Management System (DBMS) [Presentation Slide]
Database Lecture Notes
Unit1 rdbms study_materials

What's hot (20)

PDF
Complete dbms notes
PPT
Chapter02
PPTX
Basic Concept of Database
PDF
Database system architecture
PPT
Fundamentals of Database system
PDF
Database Management System And Design Questions
PPT
Dbms
PDF
Unit 3 rdbms study_materials-converted
PPTX
A concept of dbms
DOCX
DBMS FOR STUDENTS MUST DOWNLOAD AND READ
PPTX
DOCX
Database management system by Gursharan singh
PPT
L7 data model and dbms architecture
DOCX
Dbms notes
PDF
Mba 758 database management system
PPT
11 Database Concepts
PDF
Unit 4 rdbms study_material
PDF
Chapter 01 Fundamental of Database Management System (DBMS)
PDF
Complete dbms notes
Chapter02
Basic Concept of Database
Database system architecture
Fundamentals of Database system
Database Management System And Design Questions
Dbms
Unit 3 rdbms study_materials-converted
A concept of dbms
DBMS FOR STUDENTS MUST DOWNLOAD AND READ
Database management system by Gursharan singh
L7 data model and dbms architecture
Dbms notes
Mba 758 database management system
11 Database Concepts
Unit 4 rdbms study_material
Chapter 01 Fundamental of Database Management System (DBMS)
Ad

Viewers also liked (18)

PPTX
Enclave: A Book By Ann Aguirre
PDF
Database management system
PPTX
Ami
 
PDF
FINAL PAPER FP304 DATABASE SYSTEM
PDF
Chapter 2 Relational Data Model-part 2
PDF
Chapter 2 Relational Data Model-part1
PDF
Chapter 3 Entity Relationship Model
PPTX
Types of Database Models
PPT
Ipr, Intellectual Property Rights
PDF
Enterprise SEO & Content Strategy: STOP THE PAIN!
PPTX
Environmental ethics
PPTX
Intellectual Property Rights (IPR)
PPTX
Ethics In Research
PDF
Database design & Normalization (1NF, 2NF, 3NF)
PPTX
Erd practice exercises
PPTX
Ehical and social issues
PPSX
Research problem
Enclave: A Book By Ann Aguirre
Database management system
Ami
 
FINAL PAPER FP304 DATABASE SYSTEM
Chapter 2 Relational Data Model-part 2
Chapter 2 Relational Data Model-part1
Chapter 3 Entity Relationship Model
Types of Database Models
Ipr, Intellectual Property Rights
Enterprise SEO & Content Strategy: STOP THE PAIN!
Environmental ethics
Intellectual Property Rights (IPR)
Ethics In Research
Database design & Normalization (1NF, 2NF, 3NF)
Erd practice exercises
Ehical and social issues
Research problem
Ad

Similar to Dbms notesization 2014 (20)

PPTX
Components and Advantages of DBMS
PPTX
Database Management System-Data, Components, Application
PPT
DBMS Lecture 1.ppt
PPT
Lecture 1 =Unit 1 Part 1.ppt
PPT
Unit01 dbms
PPTX
BCA Database Management Systems Unit - 1.pptx
PPTX
DBMS introduction
PPTX
Presentation on Database management system
DOCX
Database
PDF
DBMS NOTES.pdf
PPTX
DATABASE MANAGEMENT SYSTEMS_module1.pptx
PDF
Database system
PDF
Clifford Sugerman
PDF
Clifford sugerman
PPTX
Basic of Database Management System(DBMS)
PPTX
MIS-3rd Unit.pptx
PPTX
MIS-3rd Unit.pptx
DOCX
DBMS PART 1.docx
PDF
Lect_2_dbms_its_rnvironment_and_components
PPTX
Components and Advantages of DBMS
Database Management System-Data, Components, Application
DBMS Lecture 1.ppt
Lecture 1 =Unit 1 Part 1.ppt
Unit01 dbms
BCA Database Management Systems Unit - 1.pptx
DBMS introduction
Presentation on Database management system
Database
DBMS NOTES.pdf
DATABASE MANAGEMENT SYSTEMS_module1.pptx
Database system
Clifford Sugerman
Clifford sugerman
Basic of Database Management System(DBMS)
MIS-3rd Unit.pptx
MIS-3rd Unit.pptx
DBMS PART 1.docx
Lect_2_dbms_its_rnvironment_and_components

More from Dev Sanskriti Vishwavidyalaya (University) (20)

PDF
How to write a research paper
PDF
एक अनजानी दौड़ में
PDF
Tech giants & top it companies india manan manish
PDF
Idea gopal-krishna-sharma-dsvv
PDF
Metadata describes about data
PDF
Syllabus of cybershiksha 2
PDF
Bachelor of computer applications
PDF
Management information systems and decision
PDF

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
RMMM.pdf make it easy to upload and study
PDF
VCE English Exam - Section C Student Revision Booklet
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Orientation - ARALprogram of Deped to the Parents.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
GDM (1) (1).pptx small presentation for students
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
STATICS OF THE RIGID BODIES Hibbelers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Yogi Goddess Pres Conference Studio Updates
O7-L3 Supply Chain Operations - ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
Computing-Curriculum for Schools in Ghana
Final Presentation General Medicine 03-08-2024.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
RMMM.pdf make it easy to upload and study
VCE English Exam - Section C Student Revision Booklet

Dbms notesization 2014

  • 1. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 1 Paper: BCA-302 DATABASE MANAGEMENT SYSTEM DEPARTMENT OF COMPUTER SCIENCE DEV SANSKRITI VISHWAVIDYALAYA, SHANTIKUNJ,HARIDWAR (UK) July-Dec 2014. Notes-ization @ DSVV.
  • 2. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 2 PREAMBLE ACKNOWLEDGEMENTS Department of Computer Science at Dev Sanskriti Vishwavidyalaya, Shantikunj, Haridwar (Uttarakhand) was established in year 2006. Department started Bachelor of Computer Applications (BCA) in year 2012. The serene and vibrant environment of the university is a boon for the students. Academically they learn new things everyday but along with that the curriculum of life management induces virtues of humanities in them. It was an initiative taken by students of BCA (2013-2016) batch to work in a team and instead of doing revision only to do a prevision on the subject. They gave it a name ―Notes-ization‖. Every one contributed to it as per his/her own caliber. But finally it‘s an sincere effort by Manan Singh (Student BCA III Sem) to finally make the work presentable and reliable to make the effort of his team mates fruitful and worth significant. Special thanks to all the web sources. Thank you every one for this inspirational work. Hope it will benefit one an all. Thanks again for carrying the spirit of SHARE-CARE-PROSPER
  • 3. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 3 TABLE OF CONTENTS UNIT TOPICS UNIT 1 Introduction to Database: Definition of Database, Components of DBMS, Three Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS, Data independence, Purpose of Database Management Systems, Structure of DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data Dictionary. UNIT 2 Data Models: Introduction to Data Models, Object Based Logical Model, Record Base Logical Model- Relational Model, Network Model, Hierarchical Model. Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity Relationship Diagram (ERD), Extended features of ERD. UNIT 3.1 Relational Databases: Introduction to Relational Databases and Terminology- Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key, Candidate Key, Primary Key, Foreign Key. UNIT 3.2 Relational Algebra: Operations, Select, Project, Union, Difference, Intersection Cartesian product, Join, Natural Join. UNIT 4 Structured Query Language (SQL): Introduction to SQL, History of SQL, Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple Queries, Nested Queries, Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF, BCNF & and Functional Dependency. UNIT 5 Relational Database Design: Introduction to Relational Database Design, DBMS v/s RDBMS. Integrity rule, Concept of Concurrency Control and Database Security.
  • 4. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 4 UNIT 1 INTRODUCTION TO DATABASE Introduction to Database: Definition of Database, Components of DBMS, Three Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS, Data independence, Purpose of Database Management Systems, Structure of DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data Dictionary.
  • 5. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 5 DEFINITION OF DATABASE A database can be summarily described as a repository for data. A database is structured collection of data. Thus, card indices, printed catalogues of archaeological artifacts and telephone directories are all examples of databases. It may be stored on a computer and examined using a program. These programs are often called `databases', but more strictly are database management systems (DMS). Computer-based databases are usually organized into one or more tables. A table stores data in a format similar to a published table and consists of a series of rows and columns. To carry the analogy further, just as a published table will have a title at the top of each column, so each column in a database table will have a name, often called a field name. The term field is often used instead of column. Each row in a table will represent one example of the type of object about which data has been collected.
  • 6. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 6 COMPONENTS OF DBMS A database management system (DBMS) consists of several components. Each component plays very important role in the database management system environment. The major components of database management system are:  Software  Hardware  Data  Procedures  Database Access Language Software The main component of a DBMS is the software. It is the set of programs used to handle the database and to control and manage the overall computerized database 1. DBMS software itself is the most important software component in the overall system. 2. Operating system including network software being used in network, to share the data of database among multiple users. 3. Application programs developed in programming languages such as C++, Visual Basic that are used to access database in database management system. Each program contains statements that request the DBMS to perform operation on database. The operations may include retrieving, updating, deleting data etc. The application program may be conventional or online workstations or terminals Hardware Hardware consists of a set of physical electronic devices such as computers (together with associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices that make interface between computers and the real world systems etc. and so on. It is impossible to implement the DBMS without the hardware devices. In a network, a powerful computer with high data processing speed and a storage device with large storage capacity are required as database server. Characteristics: It is helpful to categorize computer memory into two classes: internal memory and external memory. Although some internal memory is permanent, such as ROM, we are interested here only in memory that can be changed by programs. This memory is often known as RAM. This memory is volatile, and any electrical interruption causes the loss of data. By contrast, magnetic disks and tapes are common forms of external memory. They are
  • 7. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 7 Non-volatile memory and they retain their content for practically unlimited amounts of time. The physical characteristics of magnetic tapes force them to be accessed sequentially, making them useful for backup purposes, but not for quick access to specific data. In examining the memory needs of a DBMS, we need to consider the following issues: •Data of a DBMS must have a persistent character; in other words, data must remain available long after any program that is using it has completed its work. Also, data must remain intact even if the system breaks down. •A DBMS must access data at a relatively high rate. •Such a large quantity of data needs to be stored that the storage medium must be low cost.These requirements are satisfied at the present stage of technological development only by magnetic disks. Data Data is the most important component of the DBMS. The main purpose of DBMS is to process the data. In DBMS, databases are defined, constructed and then data is stored, updated and retrieved to and from the databases. The database contains both the actual (or operational) data and the metadata (data about data or description about data). Procedures Procedures refer to the instructions and rules that help to design the database and to use the DBMS. The users that operate and manage the DBMS require documented procedures on hot use or run the database management system. These may include. 1. Procedure to install the new DBMS. 2. To log on to the DBMS. 3. To use the DBMS or application program. 4. To make backup copies of database. 5. To change the structure of database. 6. To generate the reports of data retrieved from database. Database Access Language The database access language is used to access the data to and from the database. The users use the database access language to enter new data, change the existing data in database and to retrieve required data from databases. The user writes a set of appropriate commands in a database access language and submits these to the DBMS. The DBMS translates the user commands and sends it to a specific part of the DBMS called the Database Jet Engine. The database engine generates a set of results according to the commands submitted by user, converts
  • 8. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 8 these into a user readable form called an Inquiry Report and then displays them on the screen. The administrators may also use the database access language to create and maintain the databases. The most popular database access language is SQL (Structured Query Language). Relational databases are required to have a database query language. Users The users are the people who manage the databases and perform different operations on the databases in the database system. There are three kinds of people who play different roles in database system 1. Application Programmers 2. Database Administrators 3. End-Users Application Programmers The people who write application programs in programming languages (such as Visual Basic, Java, or C++) to interact with databases are called Application Programmer. Database Administrators A person who is responsible for managing the overall database management system is called database administrator or simply DBA. End-Users The end-users are the people who interact with database management system to perform different operations on database such as retrieving, updating, inserting, deleting data etc.
  • 9. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 9 3 LEVEL OF ARCHITECTURE PROPOSAL OF DBMS The logical architecture, also known as the ANSI/SPARC architecture, was elaborated at the beginning of the 1970s. It distinguishes three layers of data abstraction: 1. The physical layer contains specific and detailed information that describe show data are stored: addresses of various data components, lengths in bytes, etc. DBMSs aim to achieve data independence, which means that the database organization at the physical level should be indifferent to application programs. 2. The logical layer describes data in a manner that is similar to, say, definitions of structures in C. This layer has a conceptual character; it shields the user from the tedium of details contained by the physical layer, but is essential in formulating queries for the DMBS. 3. The user layer contains each user‘s perspective of the content of the database. The logical architecture describes how data in the database is perceived by users. It is not concerned with how the data is handled and processed by the DBMS, but only with how it looks. The method of data storage on the underlying file system is not revealed, and the users can manipulate the data without worrying about where it is located or how it is actually stored. This results in the database having different levels of abstraction. The majority of commercial Database Management System available today is based on the ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It divides the system into three levels of abstraction: the internal or physical level, the conceptual level, and the external or view level. The External or View Level: The external or view level is the highest level of abstraction of database. It provides a window on the conceptual view, which allows the user to see only the data of interest to them. The user can be either an application program or an end user. There can be many external views as any number of external schemas can be defined and they can overlap each other. It consists of the definition of logical records and relationships in the external view. It also contains the method of deriving the objects such as entities, attributes and relationships in the external view from the conceptual view. The Conceptual Level or Global Level: The conceptual level presents a logical view of the entire database as a unified whole. It allows the user to bring all the data in the database together and see it in a consistent manner. Hence, there is only one conceptual schema per database. The first stage in the design of a database is to define the conceptual view, and a DBMS provides a data definition language for this purpose. it describes all the records and relationships included in the database. The data definition language used to create the conceptual level must not specify any physical storage considerations that should be handled by the physical level. It does not provide any storage or access details, but defines the information content only.
  • 10. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 10 The Internal or Physical Level: The collection of files permanently stored on secondary storage devices is known as the physical database. The physical or internal level is the one closest to the physical storage and it provide a low level description of the physical database, and an interface between the operating system file system and the record structures used in higher level of abstraction. It is at this level that record types and methods of storage are defined, as well as how stored fields are represented, what physical sequence the stored records are in, and what other physical structures exist.
  • 11. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 11 ADVANTAGES & DISADVANTAGES OF DBMS Advantages of the DBMS: The DBMS serves as the intermediary between the user and the database. The database structure itself is stored as a collection of files, and the only way to access the data in those files is through the DBMS. The DBMS receives all application requests and translates them into the complex operations required to fulfill those requests. The DBMS hides much of the database‘s internal complexity from the application programs and users. The different advantages of DBMS are as follows: 1. Improved data sharing. The DBMS helps create an environment in which end users have better access to more and better-managed data. Such access makes it possible for end users to respond quickly to changes in their environment. 2. Improved data security. The more users access the data, the greater the risks of data security breaches. Corporations invest considerable amounts of time, effort, and money to ensure that corporate data are used properly. A DBMS provides a framework for better enforcement of data privacy and security policies. 3. Better data integration. Wider access to well-managed data promotes an integrated view of the organization‘s operations and a clearer view of the big picture. It becomes much easier to see how actions in one segment of the company affect other segments. 4. Minimized data inconsistency. Data inconsistency exists when different versions of the same data appear in different places. For example, data inconsistency exists when a company‘s sales department stores a sales representative‘s name as ―Bill Brown‖ and the company‘s personnel department stores that same person‘s name as ―William G. Brown,‖ or when the company‘s regional sales office shows the price of a product as $45.95 and its national sales office shows the same product‘s price as $43.95. The probability of data inconsistency is greatly reduced in a properly designed database. 5. Improved data access. The DBMS makes it possible to produce quick answers to ad hoc queries. From a database perspective, a query is a specific request issued to the DBMS for data manipulation—for example, to read or update the data. Simply put, a query is a question, and an ad hoc query is a spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to the application. For example, end users, when dealing with large amounts of sales data, might want quick answers to questions (ad hoc queries) such as: - What was the dollar volume of sales by product during the past six months? - What is the sales bonus figure for each of our salespeople during the past three months? - How many of our customers have credit balances of $3,000 or more?
  • 12. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 12 6.Improved decision making. Better-managed data and improved data access make it possible to generate better-quality information, on which better decisions are based. The quality of the information generated depends on the quality of the underlying data. Data quality is a comprehensive approach to promoting the accuracy, validity, and timeliness of the data. While the DBMS does not guarantee data quality, it provides a framework to facilitate data quality initiatives. 7.Increased end-user productivity. The availability of data, combined with the tools that transform data into usable information, empowers end users to make quick, informed decisions that can make the difference between success and failure in the global economy. Disadvantages of Database: Although the database system yields considerable advantages over previous data management approaches, database systems do carry significant disadvantages. For example: 1. Increased costs. Database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial. Training, licensing, and regulation compliance costs are often overlooked when database systems are implemented. 2. Management complexity. Database systems interface with many different technologies and have a significant impact on a company‘s resources and culture. The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company‘s objectives. Given the fact that database systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly. 3. Maintaining currency. To maximize the efficiency of the database system, you must keep your system current. Therefore, you must perform frequent updates and apply the latest patches and security measures to all components. Because database technology advances rapidly, personnel training costs tend to be significant. Vendor dependence. Given the heavy investment in technology and personnel training, companies might be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers, and those customers might be limited in their choice of database system components. 4. Frequent upgrade/replacement cycles. DBMS vendors frequently upgrade their products by adding new functionality. Such new features often come bundled in new upgrade versions of the software. Some of these versions require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs money to train database users and administrators to properly use and manage the new features.
  • 13. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 13 DATA INDEPENDENCE A major objective for three-level architecture is to provide data independence, which means that upper levels are unaffected by changes in lower levels. There are two kinds of data independence: • Logical data independence • Physical data independence Logical Data Independence Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schemas. The change would be absorbed by the mapping between the external and conceptual levels. Logical data independence also insulates application programs from operations such as combining two records into one or splitting an existing record into two or more records. This would require a change in the external/conceptual mapping so as to leave the external view unchanged. Physical Data Independence Physical data independence indicates that the physical storage structures or devices could be changed without affecting conceptual schema. The change would be absorbed by the mapping between the conceptual and internal levels. Physical data independence is achieved by the presence of the internal level of the database and the mapping or transformation from the conceptual level of the database to the internal level. Conceptual level to internal level mapping, therefore provides a means to go from the conceptual view (conceptual records) to the internal view and hence to the stored data in the database (physical records). If there is a need to change the file organization or the type of physical device used as a result of growth in the database or new technology, a change is required in the conceptual/ internal mapping between the conceptual and internal levels. This change is necessary to maintain the conceptual level invariant. The physical data independence criterion requires that the conceptual level does not specify storage structures or the access methods (indexing, hashing etc.) used to retrieve the data from the physical storage medium. Making the conceptual schema physically data independent means that the external schema, which is defined on the conceptual schema, is in turn physically data independent. The Logical data independence is difficult to achieve than physical data independence as it requires the flexibility in the design of database and prograll1iller has to foresee the future requirements or modifications in the design.
  • 14. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 14 PURPOSE OF DBMS Database management systems were developed to handle the following difficulties of typical file-processing systems supported by conventional operating systems. Data redundancy and inconsistency. Difficulty in accessing data isolation – multiple files and formats. Integrity problems, Atomicity of updates, Concurrent access by multiple users and Security problems.  In the early days, database applications were built directly on top of the file system.  Drawbacks of using file systems to store data: - Data redundancy and inconsistency. - Multiple file formats, duplication of information in different file. - Difficulty in accessing data. - Need to write a new program to carry out each new task. - Data isolation — multiple files and formats. - Integrity constraints - Hard to add new constraints or change existing ones. These problems and others led to the development of database management systems.
  • 15. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 15 STRUCTURE OF DBMS The components in the structure of DBMS are described below: DBA :- DBA means Database Administrator. HeShe is person which is responsible for the installation, configuration, upgrading, administration, monitoring, maintenance, and security of databases in an organization. Database Schema: - A database schema defines its entities and the relationship among them. Database schema is a descriptive detail of the database, which can be depicted by means of
  • 16. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 16 schema diagrams. All these activities are done by database designer to help programmers in order to give some ease of understanding all aspect of database. DDL Processor: - The DDL Processor or Compiler converts the data definition statements into a set of tables. These tables contain the metadata concerning the database and are in a form that can be used by other components of DBMS. Data Dictionary: - Information pertaining to the structure and usage of data contained in the database, the metadata, is maintained in a data dictionary. The term system catalog also describes this meta data. The data dictionary, which is a database itself, documents the data. Each database user can consult the data dictionary to learn what each piece of data and various synonyms of the data fields mean. Integrity Checker: - It checks the integrity constraints so that only valid data can be entered into the database. User: - The users are either application programmers or on-line terminal users of any degree of sophistication. Each user has a language at his or her disposal. For the application programmer it will be a conventional programming language, such as COBOL or PL/I; for the terminal user it will be either a query language or a special purpose language tailored to that user‘s requirements and supported by an on-line application program. Queries:- In DBMS a search questions that instruct the program to locate records that need specific criteria is called Query. Query Processor: - The query processor transforms user queries into a series of low level instructions. It is used to interpret the online user's query and convert it into an efficient series of operations in a form capable of being sent to the run time data manager for execution. The query processor uses the data dictionary to find the structure of the relevant portion of the database and uses this information in modifying the query and preparing and optimal plan to access the database. Programmer:- Programmer can manipulate the database in all possible ways. Application Program:- Complete, self-contained computer program that performs a specific useful task, other than system maintenance functions application programs.
  • 17. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 17 DML Processor:- DML processor process the data manipulation statements such as select , update , delete etc. that are passed by the application programmer into a computer program that perform specified task by programmer such as delete a table etc. Authorization Control: - The authorization control module checks the authorization of users in terms of various privileges to users. Command Process: - The command processor processes the queries passed by authorization control module. Query Optimizer: - The query optimizers determine an optimal strategy for the query execution. Transaction Manager: - The transaction manager ensures that the transaction properties should be maintained by the system. Scheduler: - It provides an environment in which multiple users can work on same piece of data at the same time in other words it supports concurrency. Buffer Manager: - The buffer manager is the software layer responsible for bringing pages from disk to main memory as needed. The buffer manager manages the available main memory by partitioning it into a collection of pages, which we collectively refer to as the buffer pool. Recovery Manager: - The recovery manager , which is responsible for maintaining a log and restoring the system to a consistent state after a crash. It is responsible for ensuring transaction atomicity and durability. Physical Database: - The physical database specifies additional storage details. We must decide what file organization to use to store the relations and create auxiliary data structure called indexes.
  • 18. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 18 DBA & ITS RESPONSIBILITIES A Database Administrator (acronym: DBA) is an IT Professionals responsible for: Installation, Configuration, Upgrade, Administration, Monitoring, Maintenance and Securing, of databases in an organization. Database administrator responsibilities are as follows:- 1. Database Installation and upgrading 2. Database configuration including configuration of background Processes 3. Database performance optimization & fine tuning 4. Configuring the Database in Archive log mode 5. Maintaining Database in archive log mode 6. Devising Database backup strategy 7. Monitoring & checking the Database backup & recovery process 8. Database troubleshooting 9. Database recovery in case of crash 10. Database security 11. Enabling auditing features wherever required 12. Table space management 13. Database Analysis report 14. Database health monitoring 15. Centralized controlled List of skills required to become database administrators are:-  Communication skills  Knowledge of database theory  Knowledge of database design  Knowledge about the RDBMS itself, e.g. Oracle Database, IBM DB2, Microsoft SQL Server, Adaptive Server Enterprise, MaxDB, PostgreSQL  Knowledge of Structured Query Language (SQL) e.g. SQL/PSM, Transact-SQL  General understanding of distributed computing architectures, e.g. Client/Server, Internet/Intranet, Enterprise  General understanding of the underlying operating system, e.g. Windows, Unix, Linux.  General understanding of storage technologies, memory management, disk arrays, NAS/SAN, networking  General understanding of routine maintenance, recovery, and handling failover of a Database
  • 19. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 19 DATA DICTIONARY & ITS ADVANTAGES A data dictionary, or metadata repository, as defined in the Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to databases and database management systems (DBMS):  a document describing a database or collection of databases.  an integral component of a DBMS that is required to determine its structure.  a piece of middleware that extends or supplants the native data dictionary of a DBMS. The term data dictionary and data repository are used to indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimizer, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about data. Any well designed database will surely include a data dictionary as it gives database administrators and other users easy access to the type of data that they should expect to see in every table, row, and column of the database, without actually accessing the database. Since a database is meant to be built and used by multiple users, making sure that everyone is aware of the types of data each field will accept becomes a challenge, especially when there is a lack of consistency when assigning data types to fields. A data dictionary is a simple yet effective add-on to ensure data consistency. Some of the typical components of a data dictionary entry are: • Name of the table • Name of the fields in each table • Data type of the field (integer, date, text…) • Brief description of the expected data for each field • Length of the field • Default value for that field • Is the field Nullable or Not Nullable? • Constraints that apply to each field, if any Not all of these fields (and many others) will apply to every single entry in the data dictionary. For example, if the entry were about the root description of the table, it might not require any
  • 20. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 20 information regarding fields. Some data dictionaries also include location details, such as each field‘s current location, where it actually came from, and details of the physical location such as the IP address or DNS of the server. Format and Storage There exists no standard format for creating a data dictionary. Meta-data differs from table to table. Some database administrators prefer to create simple text files, while others use diagrams and flow charts to display all their information. The only prerequisite for a data dictionary is that it should be easily searchable. Again, the only applicable rule for data dictionary storage is that it should be at a convenient location that is easily accessible to all database users. The types of files used to store data dictionaries range from text files, xml files, spreadsheets, an additional table in the database itself, to handwritten notes. It is the database administrator‘s duty to make sure that this document is always up to date, accurate, and easily accessible. Creating the Data Dictionary First, all the information required to create the data dictionary must be identified and recorded in the design documents. If the design documents are in a compatible format, it should be possible to directly export the data in them to the desired format for the data dictionary. For example, applications like Microsoft Visio allow database creation directly from the design structure and would make creation of the data dictionary simpler. Even without the use of such tools, scripts can be deployed to export data from the database to the document. There is always the option of manually creating these documents as well. Advantages of a Data Dictionary The primary advantage of creating an informative and well designed data dictionary is that it exudes clarity on the rest of the database documentation. Also, when a new user is introduced to the system or a new administrator takes over the system, identifying table structures and types becomes simpler. In scenarios involving large databases where it is impossible for an administrator to completely remember specific bits of information about thousands of fields, a data dictionary becomes a crucial necessity.
  • 21. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 21 UNIT 2 DATA MODELS Data Models: Introduction to Data Models, Object Based Logical Model, Record Base Logical Model- Relational Model, Network Model, Hierarchical Model. Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity Relationship Diagram (ERD), Extended features of ERD.
  • 22. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 22 INTRODUCTION TO DATA MODELS Data Model can be defined as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization. The importance of data models is that data models can facilitate interaction among the designer, the application programmer and the end user. Also, a well- developed data model can even foster improved understanding of the organization for which the database design is developed. Data models are a communication tool as well. A data model comprises of three components: • A structural part, consisting of a set of rules according to which databases can be constructed. • A manipulative part, defining the types of operation that are allowed on the data (this includes the operations that are used for updating or retrieving data from the database and for changing the structure of the database). • Possibly a set of integrity rules, which ensures that the data is accurate. The purpose of a data model is to represent data and to make the data understandable. There have been many data models proposed in the literature. They fall into three broad categories: • Object Based Data Models • Physical Data Models • Record Based Data Models
  • 23. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 23 OBJECT BASED LOGICAL MODEL , Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct object (a person, place, concept, and event) in the organization that is to be represented in the database. An attribute is a property that describes some aspect of the object that we wish to record, and a relationship is an association between entities. Some of the more common types of object based data model are: • Entity-Relationship • Object Oriented • Semantic • Functional
  • 24. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 24 RECORD BASED LOGICAL MODEL & ITS TYPES Record based logical models are used in describing data at the logical and view levels. In contrast to object based data models, they are used to specify the overall logical structure of the database and to provide a higher-level description of the implementation. Record based models are so named because the database is structured in fixed format records of several types. Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length. The three most widely accepted record based data models are: • Hierarchical Model • Network Model • Relational Model
  • 25. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 25 RELATIONAL MODEL The relational model for database is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database. Advantages of Relational Model: Conceptual Simplicity: We have seen that both the hierarchical and network models are conceptually simple, but relational model is simpler than both of those two. Structural Independence: In the Relational model, changes in the structure do not affect the data access. Design Implementation: the relational model achieves both data independence and structural independence. Ad hoc query capability: the presence of very powerful, flexible and easy to use capability is one of the main reason for the immense popularity of the relational database model. Disadvantages of Relational Model: Hardware overheads: relational database systems hide the implementation complexities and the physical data storage details from the user. For doing this, the relational database system need more powerful hardware computers and data storage devices. Ease of design can lead to bad design: the relational database is easy to design and use. The user needs not to know the complexities of the data storage. This ease of design and use can lead to the development and implementation of the very poorly designed database management system.
  • 26. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 26 NETWORK MODEL The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice. While the hierarchical database model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a generalized graph structure. Advantages Network Model : Conceptual Simplicity: just like hierarchical model it also simple and easy to implement. Capability to handle more relationship types: the network model can handle one to one1:1 and many to many N: N relationship. Ease to access data: the data access is easier than the hierarchical model. Data Integrity: Since it is based on the parent child relationship, there is always a link between the parent segment and the child segment under it. Data Independence: The network model is better than hierarchical model in case of data independence. Disadvantages of Network Model: System Complexity: All the records have to maintain using pointers thus the database structure becomes more complex. Operational Anomalies: As discussed earlier in network model large number of pointers is required so insertion, deletion and updating more complex. Absence of structural Independence: there is lack of structural independence because when we change the structure then it becomes compulsory to change the application too.
  • 27. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 27 HIERARCHICAL MODEL A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value. The entity type of a record defines which fields the record contains. Advantages of Hierarchical model 1.Simplicity: Since the database is based on the hierarchical structure, the relationship between the various layers is logically simple. 2.Data Security :Hierarchical model was the first database model that offered the data security that is provided by the dbms. 3.Data Integrity: Since it is based on the parent child relationship, there is always a link between the parent segment and the child segment under it. 4.Efficiency: It is very efficient because when the database contains a large number of 1:N relationship and when the user require large number of transaction. Disadvantages of Hierarchical model: 1. Implementation complexity: Although it is simple and easy to design, it is quite complex to implement. 2.Database Management Problem: If you make any changes in the database structure, then you need to make changes in the entire application program that access the database. 3.Lack of Structural Independence: there is lack of structural independence because when we change the structure then it becomes compulsory to change the application too. 4.Operational Anomalies: Hierarchical model suffers from the insert, delete and update anomalies, also retrieval operation is difficult.
  • 28. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 28 ENTITY RELATIONSHIP MODEL In DBMS, an entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database. The main components of ER models are entities (things) and the relationships that can exist among them, and databases. Entity–relationship modeling was developed by Peter Chen and published in a 1976 paper. However, variants of the idea existed previously, and have been devised subsequently such as supertype and subtype data entities and commonality relationships. ER model is represents real world situations using concepts, which are commonly used by people. It allows defining a representation of the real world at logical level.ER model has no facilities to describe machine-related aspects. In ER model the logical structure of data is captured by indicating the grouping of data into entities. The ER model also supports a top-down approach by which details can be given in successive stages. Entity: - An entity is something which is described in the database by storing its data, it may be a concrete entity a conceptual entity. Entity set:- An entity set is a collection of similar entities. Attribute:- An attribute describes a property associated with entities. Attribute will have a name and a value for each entity. Domain:- A domain defines a set of permitted values for a attribute.
  • 29. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 29 ENTITY SET Entity set:- An entity set is a collection of similar entities. A database can be modeled as: *"a collection of entities, *"relationship among entities. An entity is an object that exists and is distinguishable from other objects. Ex:- specific person, company, event, plant Entities have attributes Ex:- people have names and addresses. An entity set is a set of entities of the same type that share the same properties. Ex:- set of all persons, companies, trees, holidays. Entity is a thing in the real world with an independent existence. and entity set is collection or set all entities of a particular entity type at any point of time. Take an example: a company have many employees ,and these employees are defined as entities(e1,e2,e3....) and all these entities having same attributes are defined under ENTITY TYPE employee, and set{e1,e2,.....} is called entity set. we can also understand this by an anology. entity type is like fruit which is a class .we haven't seen any "fruit" yet though we have seen instance of fruit like "apple ,banana,mango etc. hence..fruit=entity type=EMPLOYEE apple=entity=e1 or e2 or e3enity set= bucket of apple, banana ,mango etc={e1,e2......}
  • 30. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 30 ATTRIBUTE In a database management system (DBMS), an attribute may describe a component of the database, such as a table or a field, or may be used itself as another term for a field. A table contains one or more columns there columns are the attribute in DBMS For Example-- say you have a table named "employee information" which have the following columns ID,NAME,ADDRESS THEN id ,name address are the attributes of employee.
  • 31. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 31 RELATIONSHIP SET The association among entities is called relationship. For example, employee entity has relation works at with department. Another example is for student who enrolls in some course. Here, Works at and Enrolls are called relationship. Relationship Set Relationship of similar type is called relationship set. Like entities, a relationship too can have attributes. These attributes are called descriptive attributes. Degree of Relationship The number of participating entities in an relationship defines the degree of the relationship. Binary = degree 2 Ternary = degree 3 n-ary = degree Mapping Cardinalities Cardinality defines the number of entities in one entity set which can be associated to the number of entities of other set via relationship set. One-to-one: one entity from entity set A can be associated with at most one entity of entity set B and vice versa. One-to-many: One entity from entity set A can be associated with more than one entities of entity set B but from entity set B one entity can be associated with at most one entity.
  • 32. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 32 Many-to-one: More than one entities from entity set A can be associated with at most one entity of entity set B but one entity from entity set B can be associated with more than one entity from entity set A. Many-to-many: one entity from A can be associated with more than one entity from B and vice versa
  • 33. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 33 ENTITY RELATIONSHIP DIAGRAM (ERD) Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships between entities in a database. ER diagrams often use symbols to represent three different types of information. Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and ovals are used to represent attributes.
  • 34. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 34 Components of ER Diagram The ER diagram has three main components: 1) Entity An Entity can be an object, place, person or class. In ER Diagram, an entity is represented using rectangles. Consider an example of an Organization. Employee, manager, Department, Product and many more can be taken as entities from an Organization. Weak Entity A weak entity is an entity that must defined by a foreign key relationship with another entity as it cannot be uniquely identified by its own attributes alone.Weak entity is an entity that depends on another entity. Weak entity doen‘t have key attribute of their own. Double rectangle represents weak entity. 2) Attribute An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address etc can be attributes of a Student. Databases contain information about each entity. This information is tracked in individual fields known as attributes, which normally correspond to the columns of a database table.An attribute is represented using eclipse. Key Attribute A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee‘s social security number might be the employee‘s key attribute.Key attribute
  • 35. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 35 represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with underlying lines represent Key Attribute. Composite Attribute An attribute can also have their own attributes. These attributes are known as Composite attribute. 3) Relationship Relationships illustrate how two entities share information in the database structure.A Relationship describes relations between entities. Relationship is represented using diamonds. There are three types of relationship that exist between Entities.  Binary Relationship  Recursive Relationship  Ternary Relationship Binary Relationship Binary Relationship means relation between two Entities. This is further divided into three types.
  • 36. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 36 1. One to One : This type of relationship is rarely seen in real world. The above example describes that one student can enroll ony for one course and a course will also have only one Student. This is not what you will usually see in relationship. 2. One to Many : It reflects business rule that one entity is associated with many number of same entity. For example, Student enrolls for only one Course but a Course can have many Students. The arrows in the diagram describes that one student can enroll for only one course. 3. Many to Many : The above diagram represents that many students can enroll for more than one courses. Recursive Relationship In some cases, entities can be self-linked. For example, employees can supervise other employees. Ternary Relationship Relationship of degree three is called Ternary relationship.
  • 37. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 37 EXTENDED FEATURES OF ERD ER Model has the power of expressing database entities in conceptual hierarchical manner such that, as the hierarchical goes up it generalize the view of entities and as we go deep in the hierarchy it gives us detail of every entity included. Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized view. For example, a particular student named, Mira can be generalized along with all the students, the entity shall be student, and further a student is person. The reverse is called specialization where a person is student, and that student is Mira. Generalization As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all the generalized entities is called Generalization. In generalization, a number of entities are brought together into one generalized entity based on their similar characteristics. For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds. Specialization Specialization is a process, which is opposite to generalization, as mentioned above. In specialization, a group of entities is divided into sub-groups based on their characteristics. Take a group Person for example. A person has name, date of birth, gender etc. These properties are common in all persons, human beings. But in a company, a person can be identified as employee, employer, customer or vendor based on what role do they play in company. Similarly, in a school database, a person can be specialized as teacher, student or staff; based on what role do they play in school as entities.
  • 38. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 38 Inheritance We use all above features of ER-Model, in order to create classes of objects in object oriented programming. This makes it easier for the programmer to concentrate on what she is programming. Details of entities are generally hidden from the user, this process known as abstraction. One of the important features of Generalization and Specialization, is inheritance, that is, the attributes of higher-level entities are inherited by the lower level entities. For example, attributes of a person like name, age, and gender can be inherited by lower level entities like student and teacher etc. Aggregation The E-R model cannot express relationships among relationships. When would we need such a thing? Consider a DB with information about employees who work on a particular project and use a number of machines doing that work. We get the E-R diagram shown in Figure below.
  • 39. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 39 Figure 2.20: E-R diagram with redundant relationships Relationship sets work and uses could be combined into a single set. However, they shouldn't be, as this would obscure the logical structure of this scheme. The solution is to use aggregation.  An abstraction through which relationships are treated as higher-level entities.  For our example, we treat the relationship set work and the entity sets employee and project as a higher-level entity set called work.  Figure below shows the E-R diagram with aggregation. Figure 2.21: E-R diagram with aggregation Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for each entity and relationship set as before.
  • 40. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 40 The table for relationship set uses contains a column for each attribute in the primary key of machinery and work. Aggregation is an abstraction in which relationship sets are treated as higher level entity sets. Here a relationship set is embedded inside an entity set, and these entity sets can participate in relationships.
  • 41. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 41 UNIT 3.1 RELATIONAL DATABASES Relational Databases: Introduction to Relational Databases and Terminology- Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key, Candidate Key, Primary Key, Foreign Key.
  • 42. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 42 INTRODUCTION TO RELATIONAL DATABASES Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since become the dominant database model for commercial applications (in comparison with other database models such as hierarchical, network and object models). Today, there are many commercial Relational Database Management System (RDBMS), such as Oracle, IBM DB2 and Microsoft SQL Server. There are also many free and open-source RDBMS, such as MySQL, mSQL (mini-SQL) and the embedded JavaDB. A relational database organizes data in tables (or relations). A table is made up of rows and columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A database table is similar to a spreadsheet. However, the relationships that can be created among the tables enable a relational database to efficiently store huge amount of data, and effectively retrieve selected data. A language called SQL (Structured Query Language) was developed to work with relational databases. Features of RDBMS Features and characteristics of an RDBMS can be best understood by the Codd‘s 12 rules. Codd’s12 Rules Codd's thirteen rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational, i.e., a relational database management system (RDBMS). They are sometimes jokingly referred to as "Codd's Twelve Commandments". They are as follows: Rule 0: The Foundation rule: A relational database management system must manage its stored data using only its relational capabilities. The system must qualify as relational, as a database, and as a management system. For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database. Rule 1: The information rule: All information in a relational database (including table and column names) is represented in only one way, namely as a value in a table. Rule 2: The guaranteed access rule:
  • 43. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 43 All data must be accessible. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row. Rule 3: Systematic treatment of null values: The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way. Rule 4: Active onlinecatalog based on the relational model: The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database's structure (catalog) using the same query language that they use to access the database's data. Rule 5: The comprehensive data sublanguage rule: The system must support at least one relational language that 1. Has a linear syntax 2. Can be used both interactively and within application programs, 3. Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback). Rule 6: The view updating rule: All views that are theoretically updatable must be updatable by the system. Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table. Rule 8: Physical data independence: Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure. Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence. Rule 10: Integrity independence:
  • 44. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 44 Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. Rule 11: Distribution independence: The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully: 1. when a distributed version of the DBMS is first introduced; and 2. when existing distributed data are redistributed around the system. Rule 12: The non-subversion rule: If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint. Advantages of RDBMS RDBMS offers an extremely structured way of managing data (although a good database design is needed) as everything in an RDBMS is represented as values in relations (i.e. tables). Also, many obvious advantages are visible within the 13 rules stated by Codd. Disadvantages of RDBMS RDBMS is very good for related data, but an unorganized and unrelated data creates only chaos within RDBMS. That‘s a reason why the emerging trends such as Big Data (where a lot of data from various sources is to be analyzed) don‘t welcome RDBMS, but non-relational (or non-SQL DBMSs) DBMS for their purpose.
  • 45. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 45 TERMINOLOGIES: (RELATION, TUPLE, ATTRIBUTE, CARDINALITY, DEGREE, DOMAIN) Relation: Definition- A database relation is a predefined row/column format for storing information in a relational database. Relations are equivalent to tables. It is also known as table. Example- Tuple: Definition- In the context of databases, a tuple is one record (one row). Example-
  • 46. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 46 Attribute: Definition- In general, an attribute is a characteristic. In a database management system (DBMS), an attribute refers to a database component, such a table. It also may refer to a database field. Attributes describe the instances in the row of a database. Example- Degree: Definition- It is the number of attribute of its relation schema. It is an association among two or more entities. Example-
  • 47. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 47 Cardinality: Definition- In the context of databases, cardinality refers to the uniqueness of data values contained in a column. It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one, or many-to-many. Example- Domain Definition- In database technology, domain refers to the description of an attribute's allowed values. The physical description is a set of values the attribute can have, and the semantic, or logical, description is the meaning of the attribute. Example-
  • 48. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 48 KEYS: (SUPER KEYS, CANDIDATE KEY, PRIMARY KEY, FOREIGN KEY) Definition of a Key- Simply consists of one or more attributes that determine other attributes. The key is defined as the column or attribute of the database table. For example if a table has id, name and address as the column names then each one is known as the key for that table. We can also say that the table has 3 keys as id, name and address. The keys are also used to identify each record in the database table. The following are the various types of keys available in the DBMS system.  Super key  Candidate key  Primary key  Foreign key Super Key- A superkey is a combination of columns that uniquely identifies any row within a relational database management system (RDBMS) table. A candidate key is a closely related concept where the superkey is reduced to the minimum number of columns required to uniquely identify each row. For example, imagine a table used to store customer master details that contains columns such as: customer name customer id social security number (SSN) address date of birth A certain set of columns may be extracted and guaranteed unique to each customer. Examples of superkeys are as follows:  Name, SSN, Birthdate  ID, Name, SSN However, this process may be further reduced. It can be assumed that each customer id is unique to each customer. So, the superkey may be reduced to just one field, customer id, which is the candidate key. However, to ensure absolute uniqueness, a composite candidate key may be
  • 49. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 49 formed by combining customer id with SSN. A primary key is a special term for candidate keys designated as unique identifiers for all table rows. Until this point, only columns have been considered for suitability and are thus termed candidate keys. Once a candidate key is decided, it may be defined as the primary key at the point of table creation. Candidate key- A candidate key is a column, or set of columns, in a table that can uniquely identify any database record without referring to any other data. Each table may have one or more candidate keys, but one candidate key is special, and it is called the primary key. This is usually the best among the candidate keys. When a key is composed of more than one column, it is known as a composite key. The best way to define candidate keys is with an example. For example, a bank‘s database is being designed. To uniquely define each customer‘s account, a combination of the customer‘s ID or social security number (SSN) and a sequential number for each of his or her accounts can be used. So, Mr. Andrew Smith‘s checking account can be numbered 223344-1, and his savings account 223344-2. A candidate key has just been created. In this case, the bank‘s database can issue unique account numbers that are guaranteed to prevent the problem just highlighted. For good measure, these account numbers can have some built-in logic. For example checking accounts can begin with a ‗C,‘ followed by the year and month of creation, and within that month, a sequential number. Note that it was possible to uniquely identify each account using the aforementioned SSNs and a sequential number (assuming no government mess-up, in which the same number is issued to two people). So, this is a candidate key that can potentially be used to identify records. However, a much better way of doing the same thing has just been demonstrated - creating a candidate key. In fact, if the chosen candidate key is so good that it can certainly uniquely identify each and every record, then it should be used as the primary key. All databases allow the definition of one, and only one, primary key per table. Primary key- It is a candidate key that is chosen by the database designer to identify entities with in an entity set. Primary key is the minimal super keys. In the ER diagram primary key is represented by underlining the primary key attribute. Ideally a primary key is composed of only a single attribute. But it is possible to have a primary key composed of more than one attribute. A primary key is a special relational database table column (or combination of columns) designated to uniquely identify all table records.
  • 50. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 50 A primary key‘s main features are:  It must contain a unique value for each row of data.  It cannot contain null values. A primary key is either an existing table column or a column that is specifically generated by the database according to a defined sequence. For example, students are routinely assigned unique identification (ID) numbers, uniquely-identifiable Social Security numbers. For example, a database must hold all of the data stored by a commercial bank. Two of the database tables include the CUSTOMER_MASTER, which stores basic and static customer data (e.g., name, date of birth, address and Social Security number, etc.) and the ACCOUNTS_MASTER, which stores various bank account data (e.g., account creation date, account type, withdrawal limits or corresponding account information, etc.). To uniquely identify customers, a column or combination of columns is selected to guarantee that two customers never have the same unique value. Thus, certain columns are immediately eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is designated to hold unique and government-assigned Social Security numbers. However, some account holders (e.g., children) may not have Social Security numbers, and this column‘s candidacy is eliminated. The next logical option is to use a combination of columns such as the surname to the date of birth to the email address, resulting in a long and cumbersome primary key. Foreign Key- A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them. In complex databases, data in a domain must be added across multiple tables, thus maintaining a relationship between them. The concept of referential integrity is derived from foreign key theory. Foreign keys and their implementation are more complex than primary keys. For any column acting as a foreign key, a corresponding value should exist in the link table. Special care must be taken while inserting data and removing data from the foreign key column, as a careless deletion or insertion might destroy the relationship between the two tables. For instance, if there are two tables, customer and order, a relationship can be created between them by introducing a foreign key into the order table that refers to the customer ID in the customer table. The customer ID column exists in both customer and order tables. The customer ID in the order table becomes the foreign key, referring to the primary key in the customer table. To insert an entry into the order table, the foreign key constraint must be satisfied.
  • 51. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 51 Some referential actions associated with a foreign key action include the following:  Cascade: When rows in the parent table are deleted, the matching foreign key columns in the child table are also deleted, creating a cascading delete.  Set Null: When a referenced row in the parent table is deleted or updated, the foreign key values in the referencing row are set to null to maintain the referential integrity.  Triggers: Referential actions are normally implemented as triggers. In many ways foreign key actions are similar to user-defined triggers. To ensure proper execution, ordered referential actions are sometimes replaced with their equivalent user-defined triggers.  Set Default: This referential action is similar to "set null." The foreign key values in the child table are set to the default column value when the referenced row in the parent table is deleted or updated.  Restrict: This is the normal referential action associated with a foreign key. A value in the parent table cannot be deleted or updated as long as it is referred to by a foreign key in another table.  No Action: This referential action is similar in function to the "restrict" action except that a no-action check is performed only after trying to alter the table.
  • 52. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 52 UNIT 3.2 RELATIONAL ALGEBRA Relational Algebra: Operations, Select, Project, Union, Difference, Intersection Cartesian product, Join, Natural Join.
  • 53. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 53 INTRODUCTION Relational algebra, first described by E.F. Codd while at IBM, is a family of algebra with a well-founded semantics used for modeling the data stored in relational databases, and defining queries on it. In relational algebra the queries are composed using a collection of operators, and each query describes a step by step procedure for computing the desired result. The queries are specified in operational and procedural manner that‘s why its called the procedural language also. There are many operations which we include in the relational algebra . Each relational query describes a step by step procedure for computing the desired answer ,based on the order in which operators are applied in the query. The procedural nature of the algebra allows us to think of an algebra as a recipe, or a plan for evaluating a query, and relational system in fact use algebra expressions to represent query evaluation plans. Relational algebra expression It is an expression which is a composition of the operators and it forms a complex query called a relational algebra expression. A unary algebra operator applied to a single expression ,and a binary algebra operator applied to two expression Fundamental operations of Relational algebra:  Select  Project  Union  Set different  Cartesian product  Rename
  • 54. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 54 SELECT The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a relation based on a selection condition.  The selection condition acts as a filter  Keeps only those tuples that satisfy the qualifying condition  Tuples satisfying the condition are selected whereas the other tuples are discarded (filtered out) Examples: A. Select the STUDENT tuples whose age is 18 sigmaage=18 (STUDENT) B. Select the STUDENT tuples whose course is bca sigmacourse=BCA (STUDENT) C. Select the students from the ―student relation instances‖ whose gender is male sigmagender=F(STUDENT) Student name Age gender course Ritika 18 F BCA Prerna 19 F Bsc. Ankush 20 M BA Preeti 18 F Bsc. Pragyan 20 M BA Ritu 18 F BCA Janvi 20 F BCA Answer of the first select statement is : A. Student name Age gender course Ritika 18 F BCA Preeti 18 F Bsc. Ritu 18 F BCA
  • 55. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 55 PROJECT PROJECT Operation is denoted by p (pi) If we are interested in only certain attributes of relation, we use PROJECT. This operation keeps certain columns (attributes) from a relation and discards the other columns. Example: To list all the students name and course only in the student relation model. Pistudent_name, course (student) (output from the table first) Student-name Course Ritika BCA Prerna Bsc. Ankush BA Preeti Bsc. Pragyan BA Ritu BCA Janvi BCA
  • 56. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 56 UNION It is a Binary operation, denoted by sign of union in set theory. The result of R union S, is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. The two operand relations R and S must be ―type compatible‖ (or UNION compatible), & R and S must have same number of attributes. Each pair of corresponding attributes must be type compatible (have same or compatible domains). Eg. in the bank enterprise we have depositor and borrower almost similar attributes and types. Customer name Id no. RITA 301 GITA 302 RAM 303 (DEPOSITOR‘S RELATIONAL MODEL) Customer name Id no. Sham 300 Surbhi 304 Rita 301 Ram 303 (Borrower‘s relational model) (Output: a union b) Customer_name Id no Rita 301 Gita 302 Ram 303 Sham 300 Surbhi 304
  • 57. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 57 DIFFERENCE SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – .The result of R – S, is a relation that includes all tuples that are in R but not in S. The attribute names in the result will be the same as the attribute names in R. The two operand relations R and S must be ―type compatible‖ Output: a-b Customer name Idno Gita 302 The elements of a which are not belongs to b contains only a single result
  • 58. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 58 INTERSECTION INTERSECTION: The result of the operation R intersection S, is a relation that includes all tuples that are in both R and S.  The attribute names in the result will be the same as the attribute names in R  The two operand relations R and S must be ―type compatible‖
  • 59. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 59 CARTESIAN PRODUCT , The resulting relation state has one tuple for each combination of tuples—one from R and one from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have nR * nS tuples. The two operands do NOT have to be "type compatible‖. Example: R. A 1 B 2 D 3 F 4 S. D 3 E 4 Output: R*S A 1 D 3 A 1 E 4 B 2 D 3 B 2 E 4 D 3 D 3 D 3 E 4 F 4 D 3 F 4 E 4
  • 60. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 60 JOIN , It is just a cross product of two relations.  Join allow you to evaluate a join condition between the attributes of the relations on which the join operations undertaken .  It is used to combine related tuples from two relations.  Join condition is called theta. Notation:- R JOINjoin condition S Let us take an instance:-
  • 61. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 61 NATURAL JOIN Another variation of JOIN called NATURAL JOIN — denoted by * Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such joins result in two attributes in the resulting relation having exactly the same value. A 'natural join' will remove the duplicate attribute(s).  In most systems a natural join will require that the attributes have the same name to identify the attribute(s) to be used in the join. This may require a renaming mechanism.  If you do use natural joins make sure that the relations do not have two attributes with the same name by accident. Example: The following query results refer to this database state.
  • 62. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 62 A simple database:
  • 63. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 63 Example Natural Join Operations on the sample database above:
  • 64. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 64 SUMMARY OF OPERATIONS
  • 65. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 65 UNIT 4 STRUCTURED QUERY LANGUAGE (SQL) & NORMALIZATION Structured Query Language (SQL): Introduction to SQL, History of SQL, Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple Queries, Nested Queries, Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF, BCNF & and Functional Dependency.
  • 66. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 66 INTRODUCTION TO SQL Introduction & Brief History: SQL is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS). Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language and a data manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks." Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. Since then, the standard has been revised to include a larger set of features. Why SQL?  Allows users to access data in relational database management systems.  Allows users to describe the data.  Allows users to define the data in database and manipulate that data.  Allows embedding within other languages using SQL modules, libraries & pre-compilers.  Allows users to create and drop databases and tables.  Allows users to create view, stored procedure, functions in a database.  Allows users to set permissions on tables, procedures and views Advantages of SQL:  High Speed: SQL Queries can be used to retrieve large amounts of records from a database quickly and efficiently.  Well Defined Standards Exist: SQL databases use long-established standard, which is being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.  No Coding Required: Using standard SQL it is easier to manage database systems without having to write substantial amount of code.
  • 67. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 67  Emergence of ORDBMS: Previously SQL databases were synonymous with relational database. With the emergence of Object Oriented DBMS, object storage capabilities are extended to relational databases Disadvantages of SQL:  Difficulty in Interfacing: Interfacing an SQL database is more complex than adding a few lines of code.  More Features Implemented in Proprietary way: Although SQL databases conform to ANSI &ISO standards, some databases go for proprietary extensions to standard SQL to ensure vendor lock-in.
  • 68. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 68 HISTORY OF SQL  In 1970 Edgar F. Codd, member of IBM Lab, published the classic paper, ‘A relational model of data large shared data banks‘.  With Codd‘s paper ,a great deal of research and experiments started and led to the design and prototype implementation of a number of relational languages.  One such language was Structured English Query Language (SEQUEL), defined by Donald D. Chamberlin and Raymond F. Boyce.  The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.  A revised version of SEQUEL was released in 1976-77 called SEQUEL/2 or SQL  In 1978, IBM worked to develop Codd's ideas and released a product named System/R.  In 1986IBM developed the first prototype of relational database and standardized by ANSI. The first relational database was released by Relational Software and its later becoming ORACLE.  IN 1986 ANSI and ISO published an SQL standard called ‗SQU-86‘.  The next version of standard was SQL-89,SQL-92, followed by SQL-1999,SQL- 2003,SQL-2006, SQL-2008. According to the industry trends , it is obvious that the relational model and SQL Will continue to enhance its position in near future
  • 69. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 69 CONCEPT BEHIND SQL SQL Process When you are executing an SQL command for any RDBMS, the system determines the best way to carry out your request and SQL engine figures out how to interpret the task. There are various components included in the process. These components are:-  Query Dispatcher  Optimization Engines  Classic Query Engine  SQL Query Engine Classic query engine handles all non-SQL queries but SQL query engine won't handle logical files. SQL Architecture Types of SQL Commands The following sections discuss the basic categories of commands used in SQL to perform various functions . The main categories are:-  DDL (Data Definition Language)  DML (Data Manipulation Language)  DQL (Data Query Language)  DCL (Data Control Language)  TCL (Transactional Control Language)
  • 70. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 70 DDL COMMANDS DDL (Data Definition Language) Commands of SQL allow the Data Definition functions like creating, altering and dropping the tables. The following are the various DDL Commands, along with their syntax, use and examples: #1. CREATE USE: creates a new table, view of a table, or other objects in database. SYNTAX: CREATE TABLE table_name( Column_name1 data_type(size), Column_name2 data_type(size), …. ); EXAMPLE : CREATE TABLE Persons (PersonIDint, LastNamevarchar(255), FirstNamevarchar(255), Address varchar(255), City varchar(255) ); #2. ALTER USE : modifies an existing database object such as a table. SYNTAX : ALTER TABLE table_name ADD column_namedatatype; or ALTER TABLE table_name DROP COLUMN column_name; or ALTER TABLE table_name MODIFY COLUMN column_namedatatype; EXAMPLE : ALTER TABLE Persons ADD DateOfBirth date; or
  • 71. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 71 ALTER TABLE Persons DROP COLUMN DateOfBirth; or ALTER TABLE Persons ALTER COLUMN DateOfBirth year; #3. DROP USE : deletes an entire table, a view of a table, or other object in the database. SYNTAX : DROP TABLE table_name; EXAMPLE : DROP TABLE Persons; #4. TRUNCATE USE : remove all records from a table, including all spaces allocated for the records are removed; also, reinitializes the primary key. SYNTAX : TRUNCATE TABLE table_name; EXAMPLE : TRUNCATE TABLE persons; #5. COMMENT USE : Add comments to the data dictionary.
  • 72. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 72 DML COMMANDS DML (Data Manipulation Language) Commands of SQL allow the Data Manipulation functions like inserting, updating and deleting data values in the tables created using DDL Commands. The following are the various DML Commands, along with their syntax, use and examples: #1. INSERT USE : creates a record. SYNTAX : INSERT INTO table_name VALUES (value1,value2,value3,...); or INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...); EXAMPLE : INSERT INTO Persons VALUES(1,‘manan’,’07-08-1994’); #2. UPDATE USE : modifies records. SYNTAX : UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value; EXAMPLE : UPDATE Students SET Fine=0 WHERE Stu_ID=404; #3. DELETE USE : delete records (but the structure remain intact). SYNTAX : DELETE FROM table_name WHERE some_column=some_value; EXAMPLE : DELETE FROM Persons WHERE Stu_ID=21;
  • 73. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 73 #4. CALL USE : call a PL/SQL or java subprogram. #5. EXPLAIN PLAN USE : explain access path to data. SYNTAX : EXPLAIN PLAN FOR SQL_Statement; EXAMPLE : EXPLAIN PLAN FOR SELECT last_name FROM employees; #6. LOCK TABLE USE : control concurrency. SYNTAX : LOCK TABLE table_name IN EXCLUSIVE MODE NOWAIT; This locks the table in exclusive mode but does not wait if another user already has locked the table: EXAMPLE : LOCK TABLE employees IN EXCLUSIVE MODE NOWAIT;
  • 74. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 74 DCL COMMANDS DCL (Data Control Language) Commands of SQL allow the Data Manipulation functions like granting and revoking permissions, committing changes, roll backing, etc. The following are the various DCL Commands, along with their syntax, use and examples: #1. GRANT USE : gives a privilege to user(s). SYNTAX : GRANT permission [, ...] ON [schema_name.]object_name [(column [, ...])] TO database_principal[, ...] [WITH GRANT OPTION] EXAMPLE : GRANT SELECT ON Invoices TO AnneRoberts; #2. REVOKE USE : takes back privileges/grants from users. SYNTAX : REVOKE [GRANT OPTION FOR] permission [, ...] ON [schema_name.]object_name [(column [, ...])] FROM database_principal[, ...] [CASCADE] EXAMPLE : REVOKE SELECT ON Invoices FROM AnneRoberts; #3. COMMIT USE : save work done. SYNTAX : COMMIT;
  • 75. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 75 #4. ROLLBACK USE : restore database to original sice the last COMMIT. SYNTAX : ROLLBACK; #5. SAVEPOINT USE : identify a point in a transaction in which you can later rollback. SYNTAX : SAVEPOINT SAVEPOINT_NAME; & then, ROLLBACK TO SAVEPOINT_NAME; RELEASE SAVEPOINT SAVEPOINT_NAME; #6. SET TRANSACTION USE : set space transaction, change transaction options like what rollback segments to use. SYNTAX : SET TRANSACTION [ READ WRITE | READ ONLY ];
  • 76. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 76 SIMPLE QUERIES & NESTED QUERIES A Simple Query is a query that searches using just one parameter. A simple query might use all of the fields in a table and search using just one parameter, Or it might use just the necessary fields which the information is required, but it will still use just one parameter(search criteria). The following are some types of queries: • A select query retrieves data from one or more of the tables in your database, or other queries there, and displays the results in a datasheet. You can also use a select query to group data, and to calculate sums, averages, counts, and other types of totals. • A parameter query is a type of select query that prompts you for input before it runs. The query then uses your input as criteria that control your results. For example, a typical parameter query asks you for starting high and low values, and only returns records that fall within those values. • A cross-tab query uses row headings and column headings so you can see your data in terms of two categories at once. • An action query alters your data or your database. For example, you can use an action query to create a new table, or add, delete, or change your data. A Nested Query or a subquery or inner query is a query in a query. A subquery is usually added in the WHERE Clause of sql statement. Most of the time, a subquery is used when you know how to search for a value using a SELECT statement, but do not know the exact value. A subquery is also called an inner query or inner select, while the statement containing a subquery is also called an outer query or outer select. A query result can be used in a condition of a Where clause. In such case, a query is called a subquery and complete SELECT statement is called a nested query. We can also used subquery can also be placed within HAVING clause. But subquery cannot be used with ORDERBY clause. Subqueries are queries nested inside other queries, marked off with parentheses, and sometimes referred to as "inner" queries within "outer" queries. Most often, you see subqueries in WHERE or HAVING clauses. A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery.
  • 77. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 77 A subquery can appear anywhere an expression can be used, if it returns a single value. Statements that include a subquery usually take one of these formats:  WHERE expression [NOT] IN (subquery).  WHERE expression comparison_operator [ANY | ALL] (subquery).  WHERE [NOT] EXISTS (subquery). Following are the TYPES of Nested Queries: Single - Row Subqueries The single-row subquery returns one row. A special case is the scalar subquery, which returns a single row with one column. Scalar subqueries are acceptable (and often very useful) in virtually any situation where you could use a literal value, a constant, or an expression. The single row query uses any operator in the query .i.e. (=, <=, >= <>, <, >). If any of the operators in the preceding table are used with a subquery that returns more than one row, the query will fail. Multiple-row subqueries Multiple-row subqueries return sets of rows. These queries are commonly used to generate result sets that will be passed to a DML or SELECT statement for further processing. Both single-row and multiple-row subqueries will be evaluated once, before the parent query is run. Since it returns multiple values, the query must use the set comparison operators (IN, ALL, ANY). If you use a multi row sub query with the equals comparison operators, the database will return an error if more than one row is returned. The operators in the following table can use multiple-row subqueries: Symbol Meaning IN equal to any member in a list ANY returns rows that match any value on a list ALL returns rows that match all the values in a list Multiple–Column Subquery A subquery that compares more than one column between the parent query and subquery is called the multiple column subqueries. In multiple-column subqueries, rows in the subquery results are evaluated in the main query in pair-wise comparison. That is, column-to-column comparison and row-to-row comparison.
  • 78. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 78 Correlated Subquery A correlated subquery has a more complex method of execution than single- and multiple-row subqueries and is potentially much more powerful. If a subquery references columns in the parent query, then its result will be dependent on the parent query. This makes it impossible to evaluate the subquery before evaluating the parent query. Some points to remember about the subquery are: • Subqueries are queries nested inside other queries, marked off with parentheses. • The result of inner query will pass to outer query for the preparation of final result. • ORDER BY clause is not supported for Nested Queries. • You cannot use Between Operator. • Subqueries will always return only a single value for the outer query. • A sub query must be put in the right hand of the comparison operator. • A query can contain more than one sub-query.
  • 79. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 79 NORMALIZATION Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Normalization is a process, in which we systematically examine relations for anomalies and, when detected, remove those anomalies by splitting up the relation into two new, related relations. Normalization is an important part of the database development process: Often during normalization, the database designers get their first real look into how the data are going to interact in the database. Finding problems with the database structure at this stage is strongly preferred to finding problems further along in the development process because at this point it is fairly easy to cycle back to the conceptual model (Entity Relationship model) and make changes. Normalization can also be thought of as a trade-off between data redundancy and performance. Normalizing a relation reduces data redundancy but introduces the need for joins when all of the data is required by an application such as a report query.  Problems without Normalization Without normalization it becomes difficult to handle and update the database, without facing data loss. Insertion, updation, deletion anomalies are very frequent if database is not normalized. To understand these anomalies lets us take an example of student table. S_id S_name S_address Subject_opted 401 Adam Noida Bio 402 Alex Panipat Maths 403 Stuart Jammu Maths 404 Adam Noida Physic
  • 80. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 80  Updation Anamoly: To update address of the student who occur twice or more than twice in a table, we will have to update S_address columns in all the row, else data will become inconsistent.  Insertion anamoly: Suppose for the new admission we have a S_id(student id), name, address of the student but if student is not opted for any subjects yet than we have to inset Null there , leading to insertion anamoly.  Deletion Anamoly: If S_id 401 has only one subject and temporarily he drops it , when we delete that row entire student record will be deleted along with it.
  • 81. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 81 BENEFITS OF NORMALIZATION Normalization produces smaller tables with smaller rows:  More rows per page (less logical I/O)  More rows per I/O (more efficient)  More rows fit in cache (less physical I/O) The benefits of normalization include:  Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.  You usually have more tables.  You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.  Index searching is often faster, since indexes tend to be narrower and shorter.  More tables allow better use of segments to control physical placement of data.  You usually have fewer indexes per table, so data modification commands are faster.  Fewer null values and less redundant data, making your database more compact.  Triggers execute more quickly if you are not maintaining redundant data.  Data modification anomalies are reduced.  Normalization is conceptually cleaner and easier to maintain and change as your needs change.
  • 82. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 82 NORMAL FORMS (1NF, 2NF, 3NF, BCNF) Relations can fall into one or more categories (or classes) called Normal Forms . Normal Form: A class of relations free from a certain set of modification anomalies. Normal forms are given names such as: 1. First Normal Form 2. Second Normal Form 3. Third Normal Form 4. BCNF These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF The Normalization Process for a given relation consists of:  Apply the definition of each normal form (starting with 1NF).  If a relation fails to meet the definition of a normal form, change the relation (most often by splitting the relation into two new relations) until it meets the definition.  Re-test the modified/new relations to ensure they meet the definitions of each normal form. First Normal Form (1NF)  A relation is in first normal form if it meets the definition of a relation: 1. Each attribute (column) value must be a single value only. 2. All values for a given attribute (column) must be of the same type. 3. Each attribute (column) name must be unique. 4. The order of attributes (columns) is insignificant 5. No two tuples (rows) in a relation can be identical. 6. The order of the tuples (rows) is insignificant Each table should be organized into row and each row should have a primary key that distinguishes it as unique. The primary key is usually a single column but sometimes more than one column can be combined to create a single primary key. For example consider a table is not in first normal form
  • 83. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 83 In First Normal Form any row must not have a column in which more than one value is saves, like separated with commas rather that, we must separated such data into multiple rows. Table in first Normal Form Student Age Subject Adam 15 Biology Adam 15 Maths Alex 14 Maths Stuart 17 Maths Using First Normal Form data redundancy increases as there will be many columns with the same data in multiple rows but each row as a whole will be unique. Second Normal Form (2NF)  A relation is in second normal form (2NF) if all of its non-key attributes are dependent on all of the key.  Another way to say this: A relation is in second normal form if it is free from partial-key dependencies  Relations that have a single attribute for a key are automatically in 2NF.  This is one reason why we often use artificial identifiers (non-composite keys) as keys As per the second normal form there must not be any partial dependency of any colomn on primary key. It means that for a table that has concatenated primary key, each colomn in the table that is not part of primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenates key, then the table fails second normal form In the example of First Normal Form, there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable, and follows First Normal Form, it is an inefficient use of space. Also in the above table in first normal form while the candidate key is {Student, Student Age Subject Adam 15 Biology, Maths Alex 14 Maths Stuart 17 Maths
  • 84. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 84 subject} , Age of student only depends on student columns which is incorrect as per second normal form. To achieve second normal form , it would be helpful to split out the subject into an independent table, and match then up using the student names as foreign keys. New student table following second normal form will be: Student Age Adam 15 Alex 14 Stuart 17 In student table the candidate key will be student column, because all other column i.e. Age depend on it New subject table introduced for second normal form will be: Student Subject Adam Biology Adam Maths Alex Maths Stuart Maths In subject Table the candidate key will be {subject, Student} column. Now both the above table qualifies for second normal form and will never suffer updated anomalies. Third Normal Form (3NF)  A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies.  Consider relation R containing attributes A, B and C. R(A, B, C)  If A → B and B → C then A → C  Transitive Dependency: Three attributes with the above dependencies Third normal forms apply that every non prime attribute of table must be dependent on primary key. The transitive function dependency should be removed from the table. The table must be in Second Normal Form. For example the table with the following field Student_Detail table:
  • 85. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 85 Student_id Student_name DOB Street City State Zip In this table student_id is the primary key, but street, city, state depends upon zip. The dependency between zip and other field is transitive dependency. Hence to apply third normal form we need to move the street, city, state to the new table, with zip as primary key. New Student_Detail Table: Student_id Student_name DOB Zip Address_Table: Zip Street City State The advantage of removing transitive dependency is: 1. Amount of data duplication is reduce. 2. Data integrity achieved. Boyce-Codd Normal Form (BCNF)  Boyce-Codd normal form (BCNF)  A relation is in BCNF, if and only if, every determinant is a candidate key.  The difference between 3NF and BCNF is that for a functional dependency A->B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key,  Where as BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Boyce and codd normal form is the high version of the Third Normal Form. This form deal with certain type of anamoly that is not held by third normal form. A third Normal form table which does not have any multiple overlapping candidate key is said to be in BCNF. Client Interview ClientNo interviewDate InterviewTime StaffNo roomNo CR76 13/5/02 10:30 SG5 G101 CR76 13/5/02 12:00 SG5 G101 CR74 13/5/02 12:00 SG37 G102
  • 86. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 86 CR56 1/7/02 10:30 SG5 G102 1. FD1: clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary Key) 2. FD2: staffNo, interviewDate, interviewTime- > clientNo (Candidate key) 3. FD3: roomNo, interviewDate, interviewTime -> clientNo, staffNo (Candidate key) 4. FD4: staffNo, interviewDate- > roomNo (not a candidate key)  As a consequence the ClientInterview relation may suffer from update anomalies.  For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the 13-May-02.  To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and StaffRoom as shown below: 1. Interview (clientNo, interviewDate, interviewTime, staffNo) 2. StaffRoom (staffNo, interviewDate, roomNo) Interview ClientNo InterviewDate InterviewTime StaffNo CR76 13/5/02 10:30 SG5 CR76 13/5/02 12:00 SG5 CR74 13/5/02 12:00 SG37 CR56 1/7/02 10:30 SG5 StaffRoom staffNo InterviewDate RoomNo SG5 13/5/02 G101 SG37 13/5/02 G102 SG5 1/7/02 G102
  • 87. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 87 FUNCTIONAL DEPENDENCY Functional dependency is a relationship that exists when one attribute uniquely determines another attribute. A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A -> B which would be the same as stating "B is functionally dependent upon A" Example: If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated precisely with one Y value. Functional dependency in a database serves as a constraint between two sets of attributes. Defining functional dependency is an important part of relational database design and contributes to aspect normalization. Consider an Example: REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where:  Student#-Student Number  Course#-Course Number  CourseName -CourseName  IName- Name of the instructor who delivered the course  Room#-Room number which is assigned to respective instructor  Marks- Scored in Course Course# by student Student #  Grade –Obtained by student Student# in course Course #
  • 88. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 88  Student#,Course# together (called composite attribute) defines EXACTLY ONE value of marks .This can be symbolically represented as Student#Course# -> Marks REMARK: This type of dependency is called functional dependency. In above example Marks is functionally dependent on Student#Course#. Other function dependency in the bove example are • Course# -> CourseName • Course#-> IName(Assuming one course is taught by one and only one instructor ) • IName -> Room# (Assuming each instructor has his /her own and non-shared room) • Marks ->Grade • Formally we can define functional dependency as: In a given relation R, X and Y are attributes. Attribute Y is functional dependent on attribute X if each value of X determines exactly one value of Y. This is represented as X->Y, however X may be composite in nature.
  • 89. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 89 UNIT 5 RELATIONAL DATABASE DESIGN Relational Database Design: Introduction to Relational Database Design, DBMS v/s RDBMS. Integrity rule, Concept of Concurrency Control and Database Security.
  • 90. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 90 INTRODUCTION TO RELATIONAL DATABASE DESIGN Just as a house without a foundation will fall over, a database with poorly designed tables and relationships will fail to meet the needs of its users. And hence, the need of a sound relational database design originates. The History of Relational Database Design Dr. E. F. Codd first introduced formal relational database design in 1969 while he was at IBM. Relational theory, which is based on set theory, applies to both databases and database applications. Codd developed 12 rules that determine how well an application and its data adhere to the relational model. Since Codd first conceived these 12 rules, the number of rules has expanded into the hundreds. Goals of Relational Database Design The number one goal of relational database design is to, as closely as possible, develop a database that models some real-world system. This involves breaking the real-world system into tables and fields and determining how the tables relate to each other. Although on the surface this task might appear to be trivial, it can be an extremely cumbersome process to translate a real-world system into tables and fields. A properly designed database has many benefits. The processes of adding, editing, deleting, and retrieving table data are greatly facilitated by a properly designed database. In addition, reports are easier to build. Most importantly, the database becomes easy to modify and maintain. Rules of Relational Database Design To adhere to the relational model, tables must follow certain rules. These rules determine what is stored in tables and how the tables are related. 1. The Rules of Tables Each table in a system must store data about a single entity. An entity usually represents a real-life object or event. Examples of objects are customers, employees, and inventory items. Examples of events include orders, appointments, and doctor visits. 2. The Rules of Uniqueness and Keys Tables are composed of rows and columns. To adhere to the relational model, each table must contain a unique identifier. Without a unique identifier, it becomes programmatically impossible to uniquely address a row. You guarantee uniqueness in a table by designating a primary key, which is a single column or a set of columns that uniquely identifies a row in a table. Each column or set of columns in a table that contains unique values is considered a candidate key. One candidate key becomes the primary key. The remaining candidate keys become
  • 91. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 91 alternate keys. A primary key made up of one column is considered a simple key. A primary key comprising multiple columns is considered a composite key.It is generally a good idea to pick a primary key that is  Minimal (has as few columns as possible)  Stable (rarely changes)  Simple (is familiar to the user) Following these rules greatly improves the performance and maintainability of your database application, particularly if you are dealing with large volumes of data. 3. The Rules of Foreign Keys and Domains A foreign key in one table is the field that relates to the primary key in a second table. For example, the CustomerID is the primary key in the Customers table. It is the foreign key in the Orders table.A domain is a pool of values from which columns are drawn. A simple example of a domain is the specific data range of employee hire dates. In the case of the Orders table, the domain of the CustomerID column is the range of values for the CustomerID in the Customers table. 4. Normalization and Normal Forms Some of the most difficult decisions that you face as a developer are what tables to create and what fields to place in each table, as well as how to relate the tables that you create. Normalization is the process of applying a series of rules to ensure that your database achieves optimal structure. Normal forms are a progression of these rules. Each successive normal form achieves a better database design than the previous form did. Although there are several levels of normal forms, it is generally sufficient to apply only the first three levels of normal forms. 5. Denormalization—Purposely Violating the Rules Although the developer's goal is normalization, often it makes sense to deviate from normal forms. We refer to this process as denormalization. The primary reason for applying denormalization is to enhance performance.If you decide to denormalize, document your decision. Make sure that you make the necessary application adjustments to ensure that you properly maintain the denormalized fields. Finally, test to ensure that the denormalization process actually improves performance. 6. Integrity Rules Although integrity rules are not part of normal forms, they are definitely part of the database design process. Integrity rules are broken into two categories. They include overall integrity rules and database-specific integrity rules. 7. Database-Specific Rules The other set of rules applied to a database are not applicable to all databases but are, instead, dictated by business rules that apply to a specific application. Database-specific rules are as important as overall integrity rules. They ensure that only valid data is entered into a database. An example of a database-specific integrity rule is that the delivery date for an order must fall after the order date.
  • 92. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 92 (Also, see Codd’s 12 rules) Examining the Types of Relationships Three types of relationships can exist between tables in a database: one-to-many, one-to-one, and many-to-many. Setting up the proper type of relationship between two tables in your database is imperative. The right type of relationship between two tables ensures  Data integrity  Optimal performance  Ease of use in designing system objects The reasons behind these benefits are covered throughout this chapter. Before you can understand the benefits of relationships, though, you must understand the types of relationships available. One-to-Many A one-to-many relationship is by far the most common type of relationship. In a one-to-many relationship, a record in one table can have many related records in another table. A common example is a relationship set up between a Customers table and an Orders table. For each customer in the Customers table, you want to have more than one order in the Orders table. On the other hand, each order in the Orders table can belong to only one customer. The Customers table is on the one side of the relationship, and the Orders table is on the many side. For you to implement this relationship, the field joining the two tables on the one side of the relationship must be unique. One-to-One In a one-to-one relationship, each record in the table on the one side of the relationship can have only one matching record in the table on the many side of the relationship. This relationship is not common and is used only in special circumstances. Usually, if you have set up a one-to-one relationship, you should have combined the fields from both tables into one table. Many-to-Many In a many-to-many relationship, records in both tables have matching records in the other table. An example is an Orders table and a Products table. Each order probably will contain multiple products, and each product is found on many different orders. The solution is to create a third table called OrderDetails. You relate the OrderDetails table to the Orders table in a one-to-many relationship based on the OrderID field. You relate it to the Products table in a one-to-many relationship based on the ProductID field.
  • 93. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 93 DBMS VS RDBMS History of DBMS and RDBMS Database management systems first appeared on the scene in 1960 as computers began to grow in power and speed. In the middle of 1960, there were several commercial applications in the market that were capable of producing ―navigational‖ databases. These navigational databases maintained records that could only be processed sequentially, which required a lot of computer resources and time. Relational database management systems were first suggested by Edgar Codd in the 1970s. Because navigational databases could not be ―searched‖, Edgar Codd suggested another model that could be followed to construct a database. This was the relational model that allowed users to ―search‖ it for data. It included the integration of the navigational model, along with a tabular and hierarchical model. Difference between DBMS and RDBMS:- DBMS: A DBMS is a storage area that persist the data in files. To perform the database operations, the file should be in use. Relationship can be established between 2 files. There are limitations to store records in a single database file depending upon the database manager used. DBMS allows the relations to be established between 2 files. Data is stored in flat files with metadata. DBMS does not support client / server architecture. DBMS does not follow normalization. Only single user can access the data. DBMS does not impose integrity constraints. ACID properties of database must be implemented by the user or the developer. DBMS is used for simpler applications. Small sets of data can be managed by DBMS. RDBMS:- RDBMS stores the data in tabular form. It has additional condition for supporting tabular structure or data that enforces relationships among tables. RDBMS supports client/server architecture. RDBMS follows normalization. RDBMS allows simultaneous access of users to data tables. RDBMS imposes integrity constraints. ACID properties of the database are defined in the integrity constraints. RDBMS is used for more complex applications. RDBMS solution is required by large sets of data.
  • 94. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 94 INTEGRITY RULE Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. Data integrity is the opposite of data corruption, which is a form of data loss. The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the data is the same as it was when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties. Any unintended changes to data as the result of a storage, retrieval or processing operation, including malicious intent, unexpected hardware failure, and human error, is failure of data integrity. If the changes are the result of unauthorized access, it may also be a failure of data security. TYPES OF INTEGRITY RULES/CONSTRAINTS Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity: Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null. Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown. Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn. User-defined integrity refers to a set of rules specified by a user, which do not belong to the entity, domain and referential integrity categories.
  • 95. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 95 If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features it is the responsibility of the applications to ensure data integrity while the database supports the consistency model for the data storage and retrieval. Having a single, well-controlled, and well-defined data-integrity system increases stability (one centralized system performs all data integrity operations) performance (all data integrity operations are performed in the same tier as the consistency model) re-usability (all applications benefit0 from a single centralized data integrity system) Maintainability (one centralized system for all data integrity administration). Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data-integrity features. This offers organizations substantial savings in time, money, and resources because they do not have to develop per-application data-integrity systems that must be re-factored each time business requirements change. Example An example of a data-integrity mechanism is the parent-and-child relationship of related records. If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself, which automatically insures the accuracy and integrity of the data so that no child record can exist without a parent (also called being orphaned) and that no parent loses their child records. It also ensures that no parent record can be deleted while the parent record owns any child records. All of this is handled at the database level and does not require coding integrity checks into each applications.
  • 96. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 96 CONCEPT OF CONCURRENCY CONTROL Definition Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous transactions while preserving data integrity. The Concurrency is about to control the multi-user access of Database. Illustrative Example To illustrate the concept of concurrency control, consider two travelers who go to electronic kiosks at the same time to purchase a train ticket to the same destination on the same train. There's only one seat left in the coach, but without concurrency control, it's possible that both travelers will end up purchasing a ticket for that one seat. However, with concurrency control, the database wouldn't allow this to happen. Both travellers would still be able to access the train seating database, but concurrency control would preserve data accuracy and allow only one traveler to purchase the seat. This example also illustrates the importance of addressing this issue in a multi-user database. Obviously, one could quickly run into problems with the inaccurate data that can result from several transactions occurring simultaneously and writing over each other. The following section provides strategies for implementing concurrency control. Database transaction and the ACID rules The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs):  Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible (atomic), and an aborted transaction does not affect the database at all, as if never happened.
  • 97. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 97  Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent.  Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control.  Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). Why is concurrency control needed? If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable result may occur, such as: 1. The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. 2. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results. 3. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not. Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistent. Concurrency Control Locking Strategies Pessimistic Locking: This concurrency control strategy involves keeping an entity in a database locked the entire time it exists in the database's memory. This limits or prevents users from
  • 98. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 98 altering the data entity that is locked. There are two types of locks that fall under the category of pessimistic locking: write lock and read lock. With write lock, everyone but the holder of the lock is prevented from reading, updating, or deleting the entity. With read lock, other users can read the entity, but no one except for the lock holder can update or delete it. Optimistic Locking: This strategy can be used when instances of simultaneous transactions, or collisions, are expected to be infrequent. In contrast with pessimistic locking, optimistic locking doesn't try to prevent the collisions from occurring. Instead, it aims to detect these collisions and resolve them on the chance occasions when they occur. Pessimistic locking provides a guarantee that database changes are made safely. However, it becomes less viable as the number of simultaneous users or the number of entities involved in a transaction increase because the potential for having to wait for a lock to release will increase. Optimistic locking can alleviate the problem of waiting for locks to release, but then users have the potential to experience collisions when attempting to update the database. Lock Problems: Deadlock: When dealing with locks two problems can arise, the first of which being deadlock. Deadlock refers to a particular situation where two or more processes are each waiting for another to release a resource, or more than two processes are waiting for resources in a circular chain. Deadlock is a common problem in multiprocessing where many processes share a specific type of mutually exclusive resource. Some computers, usually those intended for the time-sharing and/or real-time markets, are often equipped with a hardware lock, or hard lock, which guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly disconcerting because there is no general solution to avoid them. A fitting analogy of the deadlock problem could be a situation like when you go to unlock your car door and your passenger pulls the handle at the exact same time, leaving the door still locked. If you have ever been in a situation where the passenger is impatient and keeps trying to open the door, it can be very frustrating. Basically you can get stuck in an endless cycle, and since both actions cannot be satisfied, deadlock occurs. Livelock: Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except that the states of the processes involved constantly change with regard to one another wile never progressing. The general definition only states that a specific process is not progressing. For example, the system keeps selecting the same transaction for rollback causing the transaction to never finish executing. Another livelock situation can come about when the system is deciding which transaction gets a lock and which waits in a conflict situation.
  • 99. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 99 An illustration of livelock occurs when numerous people arrive at a four way stop, and are not quite sure who should proceed next. If no one makes a solid decision to go, and all the cars just keep creeping into the intersection afraid that someone else will possibly hit them, then a kind of livelock can happen. Basic Timestamping: Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method doesn‘t use locks to control concurrency, so it is impossible for deadlock to occur. According to this method a unique timestamp is assigned to each transaction, usually showing when it was started. This effectively allows an age to be assigned to transactions and an order to be assigned. Data items have both a read-timestamp and a write-timestamp. These timestamps are updated each time the data item is read or updated respectively. Problems arise in this system when a transaction tries to read a data item which has been written by a younger transaction. This is called a late read. This means that the data item has changed since the initial transaction start time and the solution is to roll back the timestamp and acquire a new one. Another problem occurs when a transaction tries to write a data item which has been read by a younger transaction. This is called a late write. This means that the data item has been read by another transaction since the start time of the transaction that is altering it. The solution for this problem is the same as for the late read problem. The timestamp must be rolled back and a new one acquired. Adhering to the rules of the basic timestamping process allows the transactions to be serialized and a chronological schedule of transactions can then be created. Timestamping may not be practical in the case of larger databases with high levels of transactions. A large amount of storage space would have to be dedicated to storing the timestamps in these cases.
  • 100. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 100 DATABASE SECURITY "Secret Passwords, iron bolts, gated driveways, access cards, etc. - layers of physical security in the real world are also found in the database world as well …. Creating and enforcing security procedures helps to protect what is rapidly becoming the most important corporate asset: DATA." Database security concerns the use of a broad range of information security controls to protect databases (potentially including the data, the database applications or stored functions, the database systems, the database servers and the associated network links) against compromises of their confidentiality, integrity and availability. The three main objectives of database security are: 1. Secrecy / confidentiality: Information is not disclosed to unauthorized users. Private remains private. 2. Integrity: Ensuring data are accurate; and data must be protected from unauthorized modification/destruction (only authorized users can modify data) 3. Availability: Ensuring data is accessible whenever needed by the organization. (Authorized users should not be denied access) In order to achieve these objectives, following are employed: 1. A clear and consistent security policy. (about security measures to be enforced; What data is to be protected, and which users get access to which portion of data) 2. Security mechanisms of underlying DBMS & OS; also external mechanisms, as securing access to buildings. i.e. Security measures at various levels, must be taken to ensure proper security. Authorization and Authentication are the two A‘s of security, that every secure system must be good at. The Sources of External Security Threats are: 1. Physical threats: This includes physical threat to the Hardware of the database system. And they may occur due to danger in: buildings; network; due to human errors (eg. privileged accounts left logged in) 2. Hackers & Crackers: white hat hackers: "good guys", hired to fix/test systems; don't release information about system vulnerability to public until fixed. Script kiddies: hacker "wannabes"; little programming skills and rely on tools written by others. black hat hackers: hackers who are motivated by greed or a desire to cause harm; most dangerous; very knowledgeable and their activities are often undetectable.
  • 101. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 101 Cyber-terrorists: hackers motivated by political, religious or philosophical agenda. They may try to deface websites that support opposing positions. Current global climate fears that they may even attempt to disable networks that handles utilities such as nuclear plants and water system. 3. Types of Attacks: Denial of Service (DoS) attack: A denial-of-service (DoS) or distributed denial-of-service (DDoS) attack is an attempt to make a machine or network resource unavailable to its intended users. Although the means to carry out, the motives for, and targets of a DoS attack vary, it generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host connected to the Internet. As clarification, distributed denial-of-service attacks are sent by two or more persons, or bots, and denial-of-service attacks are sent by one person or system. Buffer Overflow: There is a loophole in the programming error in system. A very popular example: SQL injection. A buffer overflow occurs when data written to a buffer also corrupts data values in memory addresses adjacent to the destination buffer due to insufficient bounds checking. This can occur when copying data from one buffer to another without first checking that the data fits within the destination buffer. Malware: Malware, short for malicious software, is any software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems. It can appear in the form of executable code, scripts, active content, and other software. 'Malware' is a general term used to refer to a variety of forms of hostile or intrusive software. Social Engineering: The psychological manipulation of people into performing actions or divulging confidential information. A type of confidence trick for the purpose of information gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of many steps in a more complex fraud scheme. Brute forces: A cryptanalytic attack that can, in theory, be used against any encrypted data. (except for data encrypted in an information-theoretically secure manner). Such an attack might be used when it is not possible to take advantage of other weaknesses in an encryption system (if any exist) that would make the task easier. It consists of systematically checking all possible keys or passwords until the correct one is found. In the worst case, this would involve traversing the entire search space. Now, as we have seen the sources of external security threats, let us study the Sources of Internal Security Threats. There may be employees threats: either intentional or accidental. Intentional Employee threat:  personnel who employ hacking techniques to upgrade their legitimate access to root or administrator.
  • 102. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 102  personnel who take advantage of legitimate access to divulge trade secrets, steal money, personal / political gain.  family members of employees who are visiting office & have been given access.  personnel who break into secure machine room to gain physical access to mainfram& other large-system consoles.  former employees, seeking revenge. Unintentional / Accidental Employee threat:  becoming victim to social engineering attack (unknowingly helping a hacker)  unknowingly revealing confidential information  physical damage (accidental) leading to data loss  inaccurate / improper usage Other threats:  electrical power fluctuations  hardware failures  Natural disasters: fires, flood. Now, knowing the sources of both external and internal source of security threats, let us move to the solutions. They are also both external and internal. Some External solutions to the security issues are: 1. Securing the perimeter: Firewall 2. Handling Malware 3. fixing buffer overflows 4. Physical server security: security cameras; smart locks; removal of signs from machine/server room or hallways (so that no one can locate sensitive hardware rooms easily); privileged accounts must never be left logged in. 5. User Authentication: Positive User Identification requires 3 things: a) something the user knows: user IDs and passwords b) something the user has: physical login devices, eg. for $5, PayPal sends small device that generates 1 time password. c) something the user is: biometrics 6. VPNs: provides encryption for data transmissions over the Internet; uses IPSec protocol. 7. Combating Social Engineering
  • 103. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 103 8. Handling other employee threats: policies; employee training sessions; when employee is fired, its account is properly erased, etc. Some Internal solutions to the security threats are: 1. Internal database User-IDs & passwords 2. To provides control of access rights to tables, views and their components: Types of Access Rights: The typical SQL-based DBMS provides 6 types of access rights: SELECT: to retrieve, INSERT, UPDATE, DELETE, REFERENCES: to reference table via a foreign key, and ALL PRIVILEGES. 3. Using an authorization matrix: a set of roles that are required for a business user. It is a normal spreadsheet document with list of roles. Further, it also contains the list of transaction in every role. When a new user joins the organization, he can find out the roles for which access is required based on the FUG (Functional User Group) in the authorization matrix. 4. Database Implementations (Data dictionary): A data dictionary is one tool organizations can use to help ensure data accuracy. GRANTING & REVOKING ACCESS RIGHTS: Granting and revoking access-rights is the one of the most visible security feature of DBMS. Using corresponding commands permissions to various objects of the database can be granted or revoked. The following SQL commands can be used to grant and revoke access rights of a table or a view to user(s). Granting Rights: Syntax: GRANT type_of_rights ON table_or_view_name TO user_id Examples:  GRANT SELECT ON order_summary TO acctg_mgr  GRANT SELECT ON order_summary TO acctg_mgr WITH GRANT OPTION (now user can also grant / pass rights to others)  GRANT SELECT, UPDATE (retail_price, distributor_name) ON item TO intern1, intern2, intern3  GRANT SELECT ON order_summary TO PUBLIC
  • 104. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 104 Revoking Rights: Syntax: REVOKE type_of_rights ON table_or_view_name FROM user_id Examples: the examples are similar to those of Granting rights if rights have been passed by the user, i.e. the user has already granted rights to others, then:  REVOKE SELECT ON order_summary FROM acctg_mgr RESTRICT (if rights would have been passed, it will not revoke)  REVOKE SELECT ON order_summary FROM acctg_mgr CASCADE (if rights would have been passed, it will revoke rights from all users to those rights have been passed)
  • 105. Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 105 End.