04 SQL

Overview
SQL Language Categories
Aggregate Functions
String Operations
Output Control
Nested Queries
Window Functions
Common Table Expressions
References

Overview

SQL (Structured Query Language) is the standard language for interacting with relational database systems. It provides a declarative approach to data manipulation, allowing users to specify what data they want without detailing how to retrieve it.

Key Characteristics

Declarative: Specify the desired result, not the procedure
Set-Based: Operates on sets of rows rather than individual records
Standardized: ANSI/ISO standard with vendor-specific extensions
Comprehensive: Covers data definition, manipulation, and control

SQL Standards

Standard	Year	Key Features
SQL-86	1986	Initial standard
SQL-89	1989	Integrity constraints
SQL-92	1992	Joins, schema manipulation
SQL:1999	1999	Triggers, recursive queries, procedural extensions
SQL:2003	2003	Window functions, XML
SQL:2016	2016	JSON support, pattern matching

SQL Language Categories

SQL is divided into distinct categories based on functionality:

Data Manipulation Language (DML)

Purpose: Query and modify data

Statement	Operation	Description
SELECT	Query	Retrieve data from tables
INSERT	Create	Add new rows to a table
UPDATE	Modify	Change existing data
DELETE	Remove	Delete rows from a table

Example:

-- Query data
SELECT name, salary FROM employees WHERE department = 'Engineering';

-- Insert data
INSERT INTO employees (id, name, department, salary)
VALUES (101, 'Alice', 'Engineering', 75000);

-- Update data
UPDATE employees SET salary = 80000 WHERE id = 101;

-- Delete data
DELETE FROM employees WHERE id = 101;

Data Definition Language (DDL)

Purpose: Define and modify database structure

Statement	Operation	Description
CREATE	Define	Create databases, tables, indexes, views
ALTER	Modify	Change structure of existing objects
DROP	Remove	Delete database objects
TRUNCATE	Clear	Remove all rows from a table (faster than DELETE)

Example:

-- Create table
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

-- Alter table
ALTER TABLE employees ADD COLUMN hire_date DATE;

-- Create index
CREATE INDEX idx_department ON employees(department);

-- Create view
CREATE VIEW engineering_staff AS
SELECT id, name, salary FROM employees WHERE department = 'Engineering';

-- Drop table
DROP TABLE employees;

Data Control Language (DCL)

Purpose: Control access to data

Statement	Operation	Description
GRANT	Allow	Give privileges to users
REVOKE	Deny	Remove privileges from users

Example:

-- Grant privileges
GRANT SELECT, INSERT ON employees TO user_alice;

-- Revoke privileges
REVOKE INSERT ON employees FROM user_alice;

-- Grant all privileges
GRANT ALL PRIVILEGES ON DATABASE company TO admin_user;

Transaction Control Language (TCL)

Purpose: Manage transactions

Statement	Operation	Description
BEGIN/START	Start	Begin a transaction
COMMIT	Finalize	Save changes permanently
ROLLBACK	Undo	Revert changes to last commit
SAVEPOINT	Checkpoint	Set a point to rollback to

Example:

-- Transaction example
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

COMMIT;  -- If successful

-- Or ROLLBACK if error occurred

Aggregate Functions

Aggregate functions compute a single result from a set of input values.

Common Aggregate Functions

Function	Description	Handles NULL
COUNT(col)	Number of non-NULL values	Ignores NULL
*COUNT()**	Number of rows	Counts all rows
SUM(col)	Sum of values	Ignores NULL
AVG(col)	Average of values	Ignores NULL
MIN(col)	Minimum value	Ignores NULL
MAX(col)	Maximum value	Ignores NULL

Basic Usage

-- Count employees
SELECT COUNT(*) AS total_employees FROM employees;

-- Average salary
SELECT AVG(salary) AS avg_salary FROM employees;

-- Min and max salary
SELECT MIN(salary) AS min_sal, MAX(salary) AS max_sal FROM employees;

-- Sum of all salaries
SELECT SUM(salary) AS total_payroll FROM employees;

DISTINCT Keyword

Purpose: Select only unique values

-- Count unique departments
SELECT COUNT(DISTINCT department) AS num_departments FROM employees;

-- Get unique departments
SELECT DISTINCT department FROM employees;

-- Multiple columns (unique combinations)
SELECT DISTINCT department, job_title FROM employees;

GROUP BY Clause

Purpose: Group rows with the same values for aggregation

Rule: Non-aggregated columns in SELECT must appear in GROUP BY

-- Employees per department
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department;

-- Average salary per department and job title
SELECT department, job_title, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title;

-- Total sales per year and quarter
SELECT YEAR(sale_date) AS year,
       QUARTER(sale_date) AS quarter,
       SUM(amount) AS total_sales
FROM sales
GROUP BY YEAR(sale_date), QUARTER(sale_date);

HAVING Clause

Purpose: Filter groups based on aggregate conditions

Difference from WHERE:

WHERE filters rows before grouping
HAVING filters groups after aggregation

-- Departments with more than 10 employees
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

-- Departments with average salary above 70000
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;

-- Combine WHERE and HAVING
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'  -- Filter rows first
GROUP BY department
HAVING AVG(salary) > 60000;      -- Filter groups second

Complete Query Structure

SELECT column1, AGG_FUNC(column2) AS alias
FROM table
WHERE condition                    -- Filter rows before grouping
GROUP BY column1                   -- Group rows
HAVING AGG_FUNC(column2) > value  -- Filter groups after aggregation
ORDER BY column1                   -- Sort results
LIMIT n;                          -- Limit number of results

String Operations

SQL provides rich string manipulation capabilities.

String Standards

SQL Standard:

Strings are case sensitive
Use single quotes only: 'text'
Double quotes for identifiers (column/table names)

Vendor Variations:

MySQL: Case insensitive by default, allows double quotes for strings
PostgreSQL: Case sensitive, follows standard strictly
SQL Server: Case sensitivity depends on collation

Pattern Matching with LIKE

LIKE operator matches patterns in strings.

Wildcards:

% : Matches any substring (including empty string)
_ : Matches exactly one character

-- Names starting with 'A'
SELECT name FROM employees WHERE name LIKE 'A%';

-- Names ending with 'son'
SELECT name FROM employees WHERE name LIKE '%son';

-- Names containing 'ann'
SELECT name FROM employees WHERE name LIKE '%ann%';

-- Exactly 5 characters
SELECT name FROM employees WHERE name LIKE '_____';

-- Second letter is 'a'
SELECT name FROM employees WHERE name LIKE '_a%';

-- Pattern with both wildcards
SELECT email FROM users WHERE email LIKE '%@gmail.%';

-- Case insensitive (PostgreSQL)
SELECT name FROM employees WHERE name ILIKE 'alice%';

String Concatenation

Operator: || (SQL standard)

Vendor Alternatives:

MySQL: CONCAT() function
SQL Server: + operator

-- Standard concatenation
SELECT first_name || ' ' || last_name AS full_name FROM employees;

-- With NULL handling (COALESCE)
SELECT first_name || ' ' || COALESCE(middle_name || ' ', '') || last_name
AS full_name FROM employees;

-- MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;

-- SQL Server
SELECT first_name + ' ' + last_name AS full_name FROM employees;

String Functions

Function	Description	Example
SUBSTRING(S, B, E)	Extract substring from position B, length E	`SUBSTRING('Hello', 1, 3)` → 'Hel'
UPPER(S)	Convert to uppercase	`UPPER('hello')` → 'HELLO'
LOWER(S)	Convert to lowercase	`LOWER('HELLO')` → 'hello'
LENGTH(S)	String length	`LENGTH('hello')` → 5
TRIM(S)	Remove leading/trailing spaces	`TRIM(' hello ')` → 'hello'
REPLACE(S, F, R)	Replace F with R in S	`REPLACE('hello', 'l', 'x')` → 'hexxo'

Examples:

-- Extract domain from email
SELECT SUBSTRING(email, POSITION('@' IN email) + 1) AS domain
FROM users;

-- Standardize names to title case
SELECT UPPER(SUBSTRING(name, 1, 1)) || LOWER(SUBSTRING(name, 2)) AS proper_name
FROM employees;

-- Find string length
SELECT name, LENGTH(name) AS name_length FROM employees;

-- Remove extra spaces
SELECT TRIM(address) AS clean_address FROM customers;

Output Control

ORDER BY Clause

Purpose: Sort query results

Syntax:

ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...

ASC: Ascending order (default)
DESC: Descending order

Examples:

-- Sort by salary ascending
SELECT name, salary FROM employees ORDER BY salary;

-- Sort by salary descending
SELECT name, salary FROM employees ORDER BY salary DESC;

-- Multiple columns
SELECT department, name, salary
FROM employees
ORDER BY department ASC, salary DESC;

-- Order by expression
SELECT name, salary, salary * 1.1 AS projected_salary
FROM employees
ORDER BY salary * 1.1 DESC;

-- Order by column position (not recommended)
SELECT name, salary FROM employees ORDER BY 2 DESC;

-- NULL handling
SELECT name, bonus FROM employees ORDER BY bonus NULLS LAST;

LIMIT and OFFSET

Purpose: Restrict the number of rows returned

Syntax:

LIMIT n [OFFSET m]

LIMIT n: Return at most n rows
OFFSET m: Skip first m rows

Examples:

-- Top 5 highest paid employees
SELECT name, salary FROM employees
ORDER BY salary DESC
LIMIT 5;

-- Pagination: rows 11-20
SELECT name, salary FROM employees
ORDER BY name
LIMIT 10 OFFSET 10;

-- Second page of 25 records
SELECT * FROM products
ORDER BY product_id
LIMIT 25 OFFSET 25;

Vendor Variations:

-- SQL Server / Access
SELECT TOP 5 name, salary FROM employees ORDER BY salary DESC;

-- Oracle
SELECT * FROM (
    SELECT name, salary FROM employees ORDER BY salary DESC
) WHERE ROWNUM <= 5;

-- SQL Server (modern)
SELECT name, salary FROM employees
ORDER BY salary DESC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

Nested Queries

Nested queries (subqueries) are queries within queries, enabling complex data retrieval.

Subquery Operators

Operator	Description	Returns True When
IN	Equal to any value	Value matches any in subquery (equivalent to = ANY)
NOT IN	Not equal to any value	Value doesn't match any in subquery
EXISTS	Subquery returns rows	At least one row returned
NOT EXISTS	Subquery returns no rows	No rows returned
ANY	Compare to any value	Condition true for at least one row
ALL	Compare to all values	Condition true for all rows

IN Operator

-- Employees in specific departments
SELECT name, department FROM employees
WHERE department IN ('Engineering', 'Sales', 'Marketing');

-- Employees in departments with more than 50 people (subquery)
SELECT name, department FROM employees
WHERE department IN (
    SELECT department FROM employees
    GROUP BY department
    HAVING COUNT(*) > 50
);

-- NOT IN
SELECT name FROM employees
WHERE department NOT IN ('HR', 'Admin');

EXISTS Operator

Characteristics:

Returns TRUE/FALSE (not data)
Stops as soon as match is found (efficient)
Better performance than IN for large datasets

-- Departments that have employees
SELECT department_name FROM departments d
WHERE EXISTS (
    SELECT 1 FROM employees e
    WHERE e.department_id = d.id
);

-- Employees who have made sales
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM sales s
    WHERE s.employee_id = e.id
);

-- NOT EXISTS (employees with no sales)
SELECT name FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM sales s
    WHERE s.employee_id = e.id
);

ANY Operator

Syntax: value operator ANY (subquery)

Operators: =, <>, <, <=, >, >=

-- Salary greater than any salary in HR department
SELECT name, salary FROM employees
WHERE salary > ANY (
    SELECT salary FROM employees
    WHERE department = 'HR'
);

-- Equal to any (equivalent to IN)
SELECT name FROM employees
WHERE department = ANY (
    SELECT department FROM departments WHERE location = 'New York'
);

ALL Operator

Syntax: value operator ALL (subquery)

-- Salary greater than all salaries in HR department
SELECT name, salary FROM employees
WHERE salary > ALL (
    SELECT salary FROM employees
    WHERE department = 'HR'
);

-- Not equal to all (equivalent to NOT IN)
SELECT name FROM employees
WHERE department <> ALL ('HR', 'Admin');

Scalar Subqueries

Return single value (one row, one column)

-- Employees earning more than average
SELECT name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Salary difference from average
SELECT name,
       salary,
       salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

Correlated Subqueries

Subquery references outer query (executed once per row)

-- Employees earning more than their department average
SELECT e1.name, e1.department, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

-- Products with above-average price in their category
SELECT p1.product_name, p1.category, p1.price
FROM products p1
WHERE p1.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category = p1.category
);

Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY.

Key Concepts

Window: Set of rows related to current row
Partition: Divide rows into groups
Order: Define sequence within partition
Frame: Subset of partition relative to current row

Ranking Functions

Function	Description	Handling Ties
ROW_NUMBER()	Sequential number	Different numbers for ties
RANK()	Rank with gaps	Same rank, skip next numbers
DENSE_RANK()	Rank without gaps	Same rank, continue sequence
NTILE(n)	Divide into n buckets	Distribute as evenly as possible

Examples:

-- Row number (always unique)
SELECT name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

-- Rank with gaps
SELECT name, salary,
       RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;
-- Salary 100k, 100k, 90k → Ranks: 1, 1, 3

-- Dense rank without gaps
SELECT name, salary,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- Salary 100k, 100k, 90k → Ranks: 1, 1, 2

-- Quartiles
SELECT name, salary,
       NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

PARTITION BY

Purpose: Divide result set into partitions for separate calculations

-- Rank within each department
SELECT name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Row number per department
SELECT name, department, hire_date,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY hire_date) AS hire_order
FROM employees;

-- Highest paid in each department
WITH ranked AS (
    SELECT name, department, salary,
           RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rank = 1;

Aggregate Window Functions

Difference from GROUP BY:

Window functions keep all rows
GROUP BY collapses rows

-- Running total
SELECT date, amount,
       SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;

-- Average by department (without collapsing rows)
SELECT name, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;

-- Difference from department average
SELECT name, department, salary,
       salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;

Offset Functions

Function	Description
LAG(col, n)	Value from n rows before
LEAD(col, n)	Value from n rows after
FIRST_VALUE(col)	First value in window
LAST_VALUE(col)	Last value in window

-- Previous and next salary
SELECT name, salary,
       LAG(salary, 1) OVER (ORDER BY salary) AS prev_salary,
       LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary
FROM employees;

-- Month-over-month change
SELECT month, revenue,
       revenue - LAG(revenue, 1) OVER (ORDER BY month) AS monthly_change
FROM monthly_revenue;

-- First and last in partition
SELECT name, department, salary,
       FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid,
       LAST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS lowest_paid
FROM employees;

Window Frame Specification

Syntax:

OVER (
    PARTITION BY column
    ORDER BY column
    ROWS|RANGE BETWEEN frame_start AND frame_end
)

Frame Bounds:

UNBOUNDED PRECEDING: Start of partition
n PRECEDING: n rows before current
CURRENT ROW: Current row
n FOLLOWING: n rows after current
UNBOUNDED FOLLOWING: End of partition

-- Moving average (3-row window)
SELECT date, amount,
       AVG(amount) OVER (
           ORDER BY date
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) AS moving_avg_3
FROM sales;

-- Cumulative sum
SELECT date, amount,
       SUM(amount) OVER (
           ORDER BY date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_sum
FROM sales;

Common Table Expressions

Common Table Expressions (CTEs) create temporary named result sets that exist only during query execution.

Basic CTE Syntax

WITH cte_name AS (
    SELECT ...
)
SELECT * FROM cte_name;

Simple CTE Examples

-- Department averages
WITH dept_avg AS (
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
)
SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN dept_avg d ON e.department = d.department
WHERE e.salary > d.avg_salary;

-- Multiple CTEs
WITH
sales_summary AS (
    SELECT product_id, SUM(quantity) AS total_sold
    FROM sales
    GROUP BY product_id
),
inventory_summary AS (
    SELECT product_id, SUM(quantity) AS total_stock
    FROM inventory
    GROUP BY product_id
)
SELECT p.product_name,
       s.total_sold,
       i.total_stock,
       i.total_stock - s.total_sold AS remaining
FROM products p
JOIN sales_summary s ON p.id = s.product_id
JOIN inventory_summary i ON p.id = i.product_id;

Advantages of CTEs

Advantage	Description
Readability	Break complex queries into logical steps
Reusability	Reference CTE multiple times in same query
Recursion	Enable recursive queries
Maintenance	Easier to debug and modify

Recursive CTEs

Purpose: Query hierarchical or tree-structured data

Syntax:

WITH RECURSIVE cte_name AS (
    -- Base case (anchor member)
    SELECT ...

    UNION ALL

    -- Recursive case (recursive member)
    SELECT ... FROM cte_name WHERE ...
)
SELECT * FROM cte_name;

Examples:

Organization Hierarchy:

-- Employee hierarchy
WITH RECURSIVE employee_hierarchy AS (
    -- Base: Top-level employees (no manager)
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: Employees managed by previous level
    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;

Number Series:

-- Generate numbers 1 to 10
WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT * FROM numbers;

Path Finding:

-- Find all paths in a graph
WITH RECURSIVE paths AS (
    -- Base: Direct connections from starting node
    SELECT source, target, ARRAY[source, target] AS path
    FROM edges
    WHERE source = 'A'

    UNION ALL

    -- Recursive: Extend paths
    SELECT p.source, e.target, p.path || e.target
    FROM paths p
    JOIN edges e ON p.target = e.source
    WHERE NOT e.target = ANY(p.path)  -- Avoid cycles
)
SELECT * FROM paths;

Date Series:

-- Generate date range
WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' AS date
    UNION ALL
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2024-01-31'
)
SELECT * FROM date_series;

References

Course Materials:

CS 6400: Database Systems Concepts and Design - Georgia Tech OMSCS

FilesExpand file tree

04-SQL.md

Latest commit

History

04-SQL.md

File metadata and controls

04 SQL

Table of Contents

Overview

Key Characteristics

SQL Standards

SQL Language Categories

Data Manipulation Language (DML)

Data Definition Language (DDL)

Data Control Language (DCL)

Transaction Control Language (TCL)

Aggregate Functions

Common Aggregate Functions

Basic Usage

DISTINCT Keyword

GROUP BY Clause

HAVING Clause

Complete Query Structure

String Operations

String Standards

Pattern Matching with LIKE

String Concatenation

String Functions

Output Control

ORDER BY Clause

LIMIT and OFFSET

Nested Queries

Subquery Operators

IN Operator

EXISTS Operator

ANY Operator

ALL Operator

Scalar Subqueries

Correlated Subqueries

Window Functions

Key Concepts

Ranking Functions

PARTITION BY

Aggregate Window Functions

Offset Functions

Window Frame Specification

Common Table Expressions

Basic CTE Syntax

Simple CTE Examples

Advantages of CTEs

Recursive CTEs

References