Skip to content

Latest commit

 

History

History
684 lines (514 loc) · 18.3 KB

File metadata and controls

684 lines (514 loc) · 18.3 KB

02 Data Model

Table of Contents

  1. Overview
  2. Extended Entity-Relationship Model
  3. E-R Diagram Notation
  4. Relational Model
  5. Relation Mapping
  6. References

Overview

Data modeling is the process of creating a conceptual representation of data structures and their relationships. The Extended Entity-Relationship (E-R) Model is a high-level conceptual model used during database design, while the Relational Model is the implementation model used in actual database systems.

This chapter covers both models and the transformation (mapping) between them.

Extended Entity-Relationship Model

The E-R model provides a graphical representation of the logical structure of a database. It consists of entities, attributes, and relationships.

Entity

Entities represent real-world objects, concepts, or things that are important to the database. They are typically nouns and have attributes that describe their properties.

Characteristics:

  • Represent distinct objects or concepts
  • Have unique identity
  • Possess attributes
  • Participate in relationships

Examples:

  • Student
  • Course
  • Employee
  • Department
  • Product

Attributes

Attributes are characteristics or properties of an entity that provide additional information.

Attribute Types

Type Description Notation
Single-Valued Has only one value Single ellipse
Multi-Valued Can have multiple values Double ellipse
Composite Combines two or more properties Ellipse with sub-ellipses
Derived Calculated from other attributes Dashed ellipse
Key Attribute Uniquely identifies entity Underlined

Examples:

Single-Valued:

  • StudentID (one ID per student)
  • Name
  • DateOfBirth

Multi-Valued:

  • PhoneNumbers (a person may have multiple phones)
  • EmailAddresses
  • Skills

Composite:

  • Address (Street, City, State, ZipCode)
  • Name (FirstName, MiddleName, LastName)

Derived:

  • Age (derived from DateOfBirth)
  • TotalPrice (derived from Quantity × UnitPrice)

Union Entity

Union types represent attributes that can have multiple types. They are useful when an attribute's data type varies depending on context.

Example:

Payment (entity)
  PaymentMethod (union type)
    - CreditCard
    - BankAccount
    - DigitalWallet

Relationships

Relationships represent associations or connections between entities, describing how entities interact with each other.

Cardinality

Cardinality defines the number of instances of one entity that can be related to instances of another entity.

Cardinality Description Notation Example
One-to-One (1:1) Each instance relates to exactly one instance 1 ---- 1 Person ↔ Passport
One-to-Many (1:N) One instance relates to many instances 1 ---- N Department ↔ Employees
Many-to-Many (M:N) Multiple instances relate to multiple instances M ---- N Students ↔ Courses

Relationship Types

Binary Relationship: Connects two entities.

Student --- Enrolls --- Course

N-ary (Ternary) Relationship: The relationship is defined between multiple entities simultaneously.

Example:

Student --- Enrolls --- Course
            |
          Professor

A student enrolls in a course taught by a specific professor.

Identifying Relationship: The existence of a relationship depends on the existence of certain entities.

Characteristics:

  • Primary key of child entity includes the primary key of parent entity
  • Child entity is a weak entity type
  • Represented with double diamond

Example:

Building (strong) --- Contains --- Room (weak)
Room's key: (BuildingID, RoomNumber)

Recursive Relationship: An entity is related to itself through a relationship.

Examples:

  • Employee --- Supervises --- Employee
  • Course --- Prerequisite --- Course
  • Person --- MarriedTo --- Person

Constraints

Constraints ensure data integrity and enforce business rules.

Entity Integrity Constraint

The primary key attribute of an entity:

  • Must have a unique value
  • Cannot be null
  • Uniquely identifies each instance

Referential Integrity Constraint

Foreign key values in a relationship must:

  • Match the primary key values of the related entity, OR
  • Be null (if allowed)

Example:

-- Orders table has CustomerID as foreign key
-- Every CustomerID in Orders must exist in Customers table

Domain Constraint

Values stored in an attribute must:

  • Adhere to specific data types
  • Follow format restrictions
  • Fall within value restrictions

Examples:

  • Age: INTEGER between 0 and 150
  • Email: VARCHAR(255) matching email format
  • Status: ENUM('active', 'inactive', 'suspended')

Cardinality Constraint

Specify the minimum and maximum number of occurrences allowed in a relationship.

Notation: (min, max)

Examples:

  • (1, 1): Exactly one (mandatory, single)
  • (0, 1): Zero or one (optional, single)
  • (1, N): One or more (mandatory, multiple)
  • (0, N): Zero or more (optional, multiple)
Employee (1,1) --- WorksIn --- (1,1) Department
Every employee works in exactly one department
Every department has at least one employee

Nullability Constraint

Determines whether an attribute can have null values or must have non-null values.

  • NOT NULL: Attribute must always have a value
  • NULL allowed: Attribute can be empty

Inheritance

Inheritance allows entities to be organized in a hierarchy, where specialized entities (subtypes) inherit properties from general entities (supertypes).

Terminology

Term Definition
Supertype General entity with common attributes
Subtype Specialized entity inheriting from supertype
Specialization Top-down process: defining subtypes from a supertype
Generalization Bottom-up process: creating a supertype from existing subtypes

Example

Person (supertype)
  |
  +--- Student (subtype)
  |      - Major
  |      - GPA
  |
  +--- Employee (subtype)
         - Salary
         - HireDate

Inheritance Constraints

Disjointness:

Type Description Example
Disjoint Subtypes are mutually exclusive Person is Student XOR Employee
Overlapping Subtypes can overlap Person can be both Student AND Employee

Completeness:

Type Description Example
Total Every supertype instance must be a subtype Every Vehicle is Car OR Truck OR Motorcycle
Partial Supertype instances may not belong to any subtype Person may or may not be Student/Employee

E-R Diagram Notation

E-R diagrams provide a visual representation of the database structure using standardized symbols.

Basic Symbols

Component Symbol Description
Entity Rectangle Represents an entity type
Weak Entity Double rectangle Entity dependent on another entity
Attribute Ellipse Property of an entity
Multi-valued Attribute Double ellipse Attribute with multiple values
Derived Attribute Dashed ellipse Attribute calculated from others
Key Attribute Underlined text Primary key attribute
Relationship Diamond Association between entities
Identifying Relationship Double diamond Relationship identifying weak entity

Cardinality Notations

Different notations exist for expressing cardinality:

Chen Notation:

Entity1 ---- (1,N) ---- Relationship ---- (1,1) ---- Entity2

Crow's Foot Notation:

Entity1 ----<---- Relationship ----||---- Entity2
         many                      one

Min-Max Notation:

Entity1 --(1,N)-- Relationship --(0,1)-- Entity2

Inheritance Symbols

Symbol Description
Solid line with triangle Specialization hierarchy (top-down)
Dashed line with triangle Generalization hierarchy (bottom-up)
d inside triangle Disjoint constraint
o inside triangle Overlapping constraint

Additional Notations

Ternary Relationship Notation: Represented by a diamond with lines connecting to three (or more) participating entities, with labels describing the nature of each connection.

Union Type Notation: Represented by a circle or ellipse split into sections, each section corresponding to a possible type of the attribute.

Attribute Inheritance Notation: A dashed line is drawn from the supertype's attribute to the subtype's attribute to show inheritance.

Relational Model

The Relational Model organizes data into tables (relations) with rows (tuples) and columns (attributes). It provides a mathematical foundation for database operations.

Core Components

Component Description Also Known As
Relation A table with rows and columns Table
Tuple A single row in a relation Row, Record
Attribute A column in a relation Column, Field
Domain Set of allowed values for an attribute Data Type
Degree Number of attributes in a relation Arity
Cardinality Number of tuples in a relation Row Count

Properties of Relations

  1. No duplicate tuples: Each tuple is unique
  2. Unordered tuples: No inherent ordering of rows
  3. Unordered attributes: Column order doesn't matter (though fixed in practice)
  4. Atomic values: Each attribute contains a single, indivisible value
  5. Same domain: All values in a column are from the same domain

Keys

Keys uniquely identify tuples in a relation.

Key Type Definition
Superkey Any set of attributes that uniquely identifies tuples
Candidate Key Minimal superkey (no proper subset is a superkey)
Primary Key Chosen candidate key for tuple identification
Foreign Key Attribute(s) referencing primary key in another relation
Composite Key Primary key consisting of multiple attributes

Example:

Students(StudentID, Email, Name, Major)
- StudentID: Primary key
- Email: Candidate key (also unique)
- (Name, Major): Neither is a key (not unique)

Enrollments(StudentID, CourseID, Grade)
- (StudentID, CourseID): Composite primary key
- StudentID: Foreign key referencing Students
- CourseID: Foreign key referencing Courses

Relationships in Relational Model

Relationships are established through the use of foreign keys that reference primary keys.

Relation Mapping

Relation Mapping is the process of transforming an E-R model into a relational schema. This systematic process ensures that all entities, attributes, and relationships are properly represented.

Entity Types Mapping

Basic Entity Mapping

Each entity type in the E-R model corresponds to a relation in the relational model.

E-R Model:

Student(StudentID, Name, Email, Major)

Relational Model:

CREATE TABLE Student (
    StudentID INT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE,
    Major VARCHAR(50)
);

Rules:

  1. Relation has the same name as entity type
  2. If entity has an identifying property, it becomes the primary key
  3. Other properties become attributes

Composite Property Types

If an entity type has a composite property type composed of multiple sub-properties, the sub-properties become separate attributes in the relation.

E-R Model:

Employee(EmployeeID, Name, Address)
Address: (Street, City, State, ZipCode)

Relational Model:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    Street VARCHAR(200),
    City VARCHAR(100),
    State CHAR(2),
    ZipCode VARCHAR(10)
);

Note: The composite property itself (Address) is lost; only sub-properties remain.

Multi-Valued Property Types

The relational model does not directly support multi-valued attributes. A separate relation must be created.

E-R Model:

Employee(EmployeeID, Name, Skills)
Skills is multi-valued

Relational Model:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE EmployeeSkills (
    EmployeeID INT,
    Skill VARCHAR(100),
    PRIMARY KEY (EmployeeID, Skill),
    FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID)
);

Relationship Mapping

One-to-One Relationships (1:1)

For a 1:1 relationship between entity types ET1 and ET2, a foreign key can be added in either ET1 or ET2.

Strategy: Place foreign key in the relation representing the entity with total participation (if one exists), or choose arbitrarily.

Example:

Employee (1,1) --- Manages --- (0,1) Department

Relational Model:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Department (
    DepartmentID INT PRIMARY KEY,
    Name VARCHAR(100),
    ManagerID INT UNIQUE,  -- Foreign key to Employee
    FOREIGN KEY (ManagerID) REFERENCES Employee(EmployeeID)
);

One-to-Many Relationships (1:N)

In a 1:N relationship where one instance of ET1 relates to many instances of ET2, the foreign key should be added in ET2 (the "many" side).

Example:

Department (1,N) --- Has --- (1,1) Employee

Relational Model:

CREATE TABLE Department (
    DepartmentID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    DepartmentID INT NOT NULL,  -- Foreign key
    FOREIGN KEY (DepartmentID) REFERENCES Department(DepartmentID)
);

Many-to-Many Relationships (M:N)

In an M:N relationship, a separate relation R is created to represent the relationship. R contains foreign keys referencing the primary keys of both ET1 and ET2.

Example:

Student (M,N) --- Enrolls --- (M,N) Course

Relational Model:

CREATE TABLE Student (
    StudentID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Course (
    CourseID INT PRIMARY KEY,
    Title VARCHAR(200)
);

CREATE TABLE Enrollment (
    StudentID INT,
    CourseID INT,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);

Inheritance Mapping

Different strategies exist for mapping inheritance hierarchies, depending on the constraints.

Strategy 1: Mandatory Disjoint

Constraints: Total participation, disjoint subtypes

Approach: Create separate relations for each subtype, each inheriting the supertype's primary key and attributes.

E-R Model:

Person(PersonID, Name)
  |
  +--- Student(GPA, Major)
  +--- Employee(Salary, HireDate)

Relational Model:

-- No Person table created

CREATE TABLE Student (
    PersonID INT PRIMARY KEY,  -- Inherited
    Name VARCHAR(100),         -- Inherited
    GPA DECIMAL(3,2),
    Major VARCHAR(50)
);

CREATE TABLE Employee (
    PersonID INT PRIMARY KEY,  -- Inherited
    Name VARCHAR(100),         -- Inherited
    Salary DECIMAL(10,2),
    HireDate DATE
);

Advantages: No redundant data, simple queries for subtypes Disadvantages: Queries involving all persons require UNION

Strategy 2: Mandatory, Overlap Allowed

Constraints: Total participation, overlapping subtypes

Approach: Create a single relation for the supertype with all subtype attributes. Include a type indicator.

E-R Model:

Person(PersonID, Name)
  |
  +--- Student(GPA, Major)
  +--- Employee(Salary, HireDate)

Relational Model:

CREATE TABLE Person (
    PersonID INT PRIMARY KEY,
    Name VARCHAR(100),
    Type VARCHAR(20),  -- 'Student', 'Employee', 'Both'
    GPA DECIMAL(3,2),
    Major VARCHAR(50),
    Salary DECIMAL(10,2),
    HireDate DATE
);

Advantages: Single table for all persons, allows overlap Disadvantages: Many NULL values, potential data integrity issues

Strategy 3: Non-Mandatory, Overlap Allowed

Constraints: Partial participation, overlapping subtypes

Approach: Create relation for supertype and separate relations for each subtype.

Relational Model:

CREATE TABLE Person (
    PersonID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Student (
    PersonID INT PRIMARY KEY,
    GPA DECIMAL(3,2),
    Major VARCHAR(50),
    FOREIGN KEY (PersonID) REFERENCES Person(PersonID)
);

CREATE TABLE Employee (
    PersonID INT PRIMARY KEY,
    Salary DECIMAL(10,2),
    HireDate DATE,
    FOREIGN KEY (PersonID) REFERENCES Person(PersonID)
);

Advantages: Flexible, supports partial participation and overlap Disadvantages: Requires joins to get complete information

Strategy 4: Non-Mandatory, Disjoint

Constraints: Partial participation, disjoint subtypes

Approach: Create separate relations for supertype and each subtype. Subtypes reference supertype.

Relational Model:

CREATE TABLE Person (
    PersonID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Student (
    PersonID INT PRIMARY KEY,
    GPA DECIMAL(3,2),
    Major VARCHAR(50),
    FOREIGN KEY (PersonID) REFERENCES Person(PersonID) ON DELETE CASCADE
);

CREATE TABLE Employee (
    PersonID INT PRIMARY KEY,
    Salary DECIMAL(10,2),
    HireDate DATE,
    FOREIGN KEY (PersonID) REFERENCES Person(PersonID) ON DELETE CASCADE
);

Advantages: Clear separation, supports partial participation Disadvantages: Complex queries requiring multiple joins

Union Type Mapping

For union types, create a relation for the union entity with an artificial identifier.

E-R Model:

Vehicle (union of Car, Truck)
  VehicleID (artificial identifier)
  LicensePlate

Relational Model:

CREATE TABLE Car (
    CarID INT PRIMARY KEY,
    Model VARCHAR(100)
);

CREATE TABLE Truck (
    TruckID INT PRIMARY KEY,
    Capacity INT
);

CREATE TABLE Vehicle (
    VehicleID INT PRIMARY KEY,
    LicensePlate VARCHAR(20),
    SourceType VARCHAR(10),  -- 'Car' or 'Truck'
    SourceID INT  -- References either CarID or TruckID
);

Note: This approach has limitations in enforcing referential integrity. Modern databases may use table inheritance or other advanced features.

References

Course Materials:

  • CS 6400: Database Systems Concepts and Design - Georgia Tech OMSCS