DB-Lec 2: Rational Model (1)
DB-Lecture 2: Relational Model(1)
Definition:
-
The relational model is very simple and elegant.
-
A relational database is a collection of one or more relations, which are based on the relational model.
-
A relation is a table with rows and columns.
-
The major advantages of the relational model are its straightforward data representation and the ease with which even complex queries can be expressed.
-
Owing to the great language SQL, the most widely used language for creating, manipulating, and querying relational database.
I.Structure of Relational Databases
1.1.Basic Structure
Formally, given sets , a relation is a subset of — a Cartesian product of a list of domain .
Thus, a relation is a set of n-tuples , where each
1.2.Attribute Types
-
Each attribute of a relation has a name
-
The set of allowed valued for each attribute is called the domain of the attribute.
-
Attribute values are (normally) required to be atomic, i.e., indivisible:
-
Multivalued attribute values are not atomic.
-
Composite attribute values are not atomic.
-
The special value null is a member of every domain.
-
The null value causes complications in the definition of many operations.
1.3.Concepts about Relation
A relation is concerned with two concepts: relation schema and relation instance.
The relation schema describes the structure of the relation.
The relation instance corresponds to the snapshot of the data in the relation at a given instant in time.
Relation Schema
Assume are attributes. Formally expressed:
is a relation schema.
is a relation on the relation schema
Relation Instance
The current values (i.e., relation instance) of a relation are specified by a table.
An element t of r is a tuple, represented by a row in a table.
Let a tuple variable t be a tuple, then t[name] denotes the value of t on the name attribute.
The Properties of Relation
-
The order of tuples is irrelevant (i.e., tuples may be stored in an arbitrary).
-
No duplicated tuples in a relation.
-
Attribute values are atomic.
Database
A database consists of multiple relations
Information about an enterprise (e.g., university) is broken up into parts: Instructor, Student, Advisor.
Bad design: univ (instructor -ID, name, dept_name, salary, student_Id, …)
results in :
-
Repetition of information (e.g., two students have the same instructor)
-
The need for null values (e.g., represent an student with no advisor)
Normalization theory (Chapter 7) deals with how to design “good” relational schemas
Key
Let
is a superkey of is values for are sufficient to identify a unique tuple of each possible relation :
- E.g., both {ID} and {ID, name} are superkeys of the relation instructor.
is a candidate ket is is minimal superkey.
- E.g., {ID} is a candidate key for the relation instructor, since it is a superkey and no any subset.
is a primary key, if is a candidate key and is defined by user explicity.
- Primary key is usually marked by underline.
Primary key and candidate key can not be Null.
Foreign Key
Assume there exists relations and : , we can say that attribute B in relation r is foreign key referencing s, and r is a referenced relation, and s is a referenced relation.
参照关系中的外码的值必须在被参照关系中实际存在,或为null.
Primary key and foreign key are integrated constraints.
-
Primary key constraints
-
Foreign key constraint
-
Referential integrity constraint
Query Languages
Language in which user requests information from the database.
Pure query languages: refer to query languages that adhere to the principles of purity, meaning they are free from side effects and rely solely on input data to produce output.
-
Relational Algebra — the basis of SQL
-
Tuple Relational Calculus
-
Domain Relational Calculus —QBE
Pure query languages form underlying basis of query languages that people use.
II.Relational-Algebra
2.1.Fundamental Relational-Algebra Operations
-
Six basic operators:
-
Select
The select operation selects tuples that satisfy a given predicate.
Notation:
p is called the selection predicate
can be defined as:
In the selection predicate p, we allow comparisons using
The select predicate may include comparisons between two attributes.
General form of a single p
We can combine several predicates into a larger predicate by using the connectives
Note that, the selection conditions need to aim at the attribute values of the same tuple, when we conduct section operation.
- Project
Project operation is a unary operation that returns its argument relation, with certain attributes left out.
Notation:
where are attribute names and is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that are not listed.
Duplicate rows removed from result, since relations are sets.
- Union
The union operation allows us to combine two relations.
Notation:
Defined as:
For to be valid:
-
r and s must have the same arity (i.e., the same number of attributes)
-
The attribute domains must be compatible
-
set difference
The set-difference operation allows us to find tuples that are in one relation but are not in another.
Notation:
Defined as:
Set differences must be taken between compatible relations:
-
r and s must have the same arity.
-
Attribute domains of r and s must be compatible.
-
Cartesian product
The Cartesian-product operation (denoted by ) allows us to combine information from any two relations.
Notation:
Defined as:
Assume that attributes of r® and s(S) are disjoint (i.e., ).
If attributes of r® and s(S) are not disjoint, then renaming for attributes must be used.
- attaching to the attribute the name of the relation from which the attribute originally came.
Composition of Operations:
-
The result of a relational-algebra operation is relation
-
Relational-algebra operations can be composed together into a relational-algebra expression
-
Consider the query – Find the names of all instructors in the Physics department.
-
Instead of giving the name of a relation as the argument of the projection operation, we give an expression that evaluates to a relation.
-
Rename
Allows us to name, and therefore to refer to, the results of relational-algebra expressions. (procedural)
Allows us to refer to a relation by more than one name.
The expression: returns the expression E under the name X
If a relational-algebra expression E has arity n, then
returns the result of expression E
-
The operators take one or two relations as inputs, and return a new relation as a result.
-
one relation: unary operation
-
two relations: binary operations
Example Queries:
-
Example 1: Find all loans of over $1200
-
Example 2: Find the loan number for each loan of an amount greater than $1200.
-
Example 3: Find the names of all customers who have a loan, or an account, or both, from the bank.
-
Example 4: Find the names of all customers who at least have a loan and an account at bank.