(1)Entities
(2)Attributes属性
(3)Entity set
A mathematical definition: –if A, B are sets, then a relation R is a subset of A × B
The degree of relationship
Relationship Attribute
Multiway Relationships
左:只有一个关系:选课(课程,教师,学生)
右:有三个关系:教授(课程,教师);上课(课程,学生);指导(教师,学生)
Arrows in Multiway Relationships:—>means can be determined by others
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types:
Some elements in A and B may not be mapped to any elements in the other set
need an entity set twice in one relationship
Entity sets of a relationship need not be distinct
Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of the relationship
A constraint = an assertion about the database that must be true at all times
Part of the database schema
Very important in database design
Finding constraints is part of the modeling process.
Commonly used constraints:
*Keys: resident identification card number uniquely identifies a person.
Single-value constraints: a person can have only one father.
*Referential integrity constraints: if you work for a company, it must exist in the database.
*Domain constraints: peoples’ ages are between 0 and 150.
General constraints: all others (at most 50 students enroll in a class)
A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
A candidate key of an entity set is a minimal super key
Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Every entity set must have a key
So that the entity can be identified
A key can consist of more than one attribute
There can be more than one key for an entity set
one key will be designated as primary key
Design Techniques
Avoiding Redundancy
Entity Sets Versus Attributes
Represent the abstract information structure of business domain
Unambiguous, formalized definition on the model
Implemented in DBMS, adjusted to achieve efficiency on computing system.
Three elements of logical data model:
物理模型[–>文件模型–>层次模型–>网状模型–>关系模型(most popular LDM)–>XML模型/面向对象模型–>]概念模型
Database that adopts relational model
Ideas
->Database Model (E/R, ODL)( Diagrams (E/R))
->Relational Schema(Tables: column names: attributes rows: tuples)
->Physical storage(Complex file organization and index structures.)
Both are used to model data
ER model has many concepts
Relational model
Analogy with programming languages:
Important distinction:
The Schema of a Relation:
The Schema of a Database
Relational schema = R(A1,…,Ak):
Database schema = R1(…), R2(…), …, Rn(…)
The database maintains a current database state.
Updates to the data:
? 1) add a tuple
? 2) delete a tuple
? 3) modify an attribute in a tuple
Updates to the data happen very frequently.
D1XD2XD3 = {(a, c, e), (a, c, f), (a, c, g), (a, d, e), (a, d, f), (a, d, g), (b, c, e), (b, c, f), (b, c, g), (b, d, e), (b, d, f), (b, d, g)}
由定义可以看出,笛卡尔积也是一个集合,其中:
(1,2,3)≠(2,3,1)≠(1,3,2);
而集合中的元素是没有排序次序的,如(1,2,3)=(2,3,1)=(1,3,2)。
Relation is the subset of Cartesian product
Relation keeps the meaningful tuple in the Cartesian product
Relation has FINITE tuples(关系有有限元组)
Relation as table
Rows = tuples
Columns = components
Names of columns = attributes
Relation name + set of attribute names= schema
REL (A 1,A 2,…,An)
Set theoretic
Domain — set of values: like a data type
Cartesian product: D1 X D2 X … X Dn
n-tuples (V1,V2,…,Vn)
Relation=subset of Cartesian product of one or more domains
FINITE only; empty set allowed
Tuples(元组) = members of a relation inst.
Arity (元)= number of domains
Components (分量)= values in a tuple
Domains(域) — corresp. with attributes
Cardinality(基数)= max number of tuples#### Database
A database consists of multiple relations
Information about an enterprise is broken up into parts, with each relation storing one part of the information
account : stores information about accounts
depositor : stores information about which customer owns which account
customer : stores information about customers
Tuples with same value are not allowed
Relations are Unordered
Attribute values must comply with domain constraint
Each attribute must has a different name
Different attributes can use a same domain
Attribute values are required to be atomic; that is, indivisible(不可再分的)
Basic cases
Special cases
Purpose: prevent semantic inconsistencies in data(防止数据中的语义不一致)
predicates on the database
must always be true (that is, checked whenever database gets updated)
Domain Integrity Constraints(1 table)
Entity Integrity Constraints(1 table)
Referential Integrity constraints ( 2 tables)
A unique key can uniquely identify each row in a table. That is, no two distinct rows in a table can have the same value (or combination of values) in those columns constrained by unique key.
A unique key can allow null values. (You can define a unique key on columns that allow nulls.)
Each table can have multiple unique keys
Referential integrity is a property of data which, when satisfied, requires every value of one attribute (column) of a relation (table) to exist as a value of another attribute in a different (or the same) relation (table).
For referential integrity to hold in a relational database, any field in a table that is declared a foreign key can contain only values from a parent table’s primary key or a candidate key
Goal: specify what we want from our database
Could write in C++/Java, but bad idea
Instead use high-level query languages:
Theoretical: Relational Algebra, Datalog
Practical: SQL
Relational algebra: a basic set of operations on relations that provide the basic principles.
To use the “stack” data structure in my program, I need to know
Next, I look for an implementation of stack
LEDA already implement PUSH and POP
It also gives me a simple language L, in which to define a stack and call PUSH and POP
Can also define an expression of operations on stacks
To summarize, I know
Mathematical system consisting of:
An algebra whose operands are relations or variables that represent relations.
Operators are designed to do the most common things that we need to do with relations in a database.
Operators: relations as input, new relation as output
Five basic RA operations:
Derived operations:
Binary operations
Unary operations
Cartesian Product
Union: all tuples in R1 or R2
No duplicate tuples
Notation: R1 U R2
R1, R2 must have the same schema
R1 U R2 has the same schema as R1, R2
Example:
##### Set Operations: Difference
Difference: all tuples in R1 and not in R2
Notation: R1 – R2
R1, R2 must have the same schema
R1 - R2 has the same schema as R1, R2
Example:
Returns all tuples which satisfy a condition
Notation: σc(R)\sigma_{c}(R)σc?(R)
c is a condition: =, <, >, and, or, not
Output schema: same as input schema
##### Projection投影运算
Unary operation: returns certain columns
Notation: ΠA1,…,An(R)\Pi_{A1,…,An} (R)ΠA1,…,An?(R)
Input schema R(B1,…,Bm)
Condition: {A1, …, An}包含于{B1, …, Bm}
Output schema S(A1,…,An)
no duplicate elimination
Each tuple in R1 with each tuple in R2
Notation: R1 x R2
Input schemas R1(A1,…,An), R2(B1,…,Bm)
Output schema is S(A1, …, An, B1, …, Bm*)*
Notation: R1 x R2
1) Intersection
2) Most importantly: Join
3) Division
Difference: all tuples both in R1 and in R2
Notation: R1∩R2
R1, R2 must have the same schema
R1∩R2 has the same schema as R1, R2
Intersection is derived: R1∩R2 = R1 – (R1 – R2)
Theta join
A join that involves a predicate
Notation: R1|X|cR2 where c is a condition
Input schemas: R1(A1,…,An), R2(B1,…,Bm)
Output schema: S(A1,…,An,B1,…,Bm)
Derived operator:R1∣X∣cR2=σc(R1xR2)R1|X|_cR2 = \sigma_c(R1 x R2)R1∣X∣c?R2=σc?(R1xR2)
Natural join
A frequent type of join connects two relations by:
Equating attributes of the same name, and
Projecting out one copy of each pair of equated attributes.
Called natural join.
Denoted R3 := R1 JOIN R2.
Notation: R1|X|R2
Input Schema: R1(A1, …, An), R2(B1, …, Bm)
Output Schema: S(C1,…,Cp)
Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm}
Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes:
{A1,…,An}|X|{B1,…, Bm} (called the join attributes)
Equivalent to a cross product followed by selection
Equi-join
Most frequently used in practice: R1|X|A=B_{A=B}A=B?R2
Natural join is a particular case of equi-join
A lot of research on how to do it efficiently
Outer join
An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
Uses null values to signify the unmatched values
Subtypes: left outer joins, right outer joins, and full outer joins
etc.
R(X,Y)是被除关系
S(Y)是除关系
商关系由R中某些X属性值构成,其中的任一X值所对应的一组Y值都包含除关系S。
除运算常用于包含语义“至少…”的语义的查询或运算
Data Management Technology(2) -- Data Model
原文:https://www.cnblogs.com/wojiaobuzhidao/p/11071393.html