- Overview
- SQL Language Categories
- Aggregate Functions
- String Operations
- Output Control
- Nested Queries
- Window Functions
- Common Table Expressions
- References
SQL (Structured Query Language) is the standard language for interacting with relational database systems. It provides a declarative approach to data manipulation, allowing users to specify what data they want without detailing how to retrieve it.
- Declarative: Specify the desired result, not the procedure
- Set-Based: Operates on sets of rows rather than individual records
- Standardized: ANSI/ISO standard with vendor-specific extensions
- Comprehensive: Covers data definition, manipulation, and control
| Standard | Year | Key Features |
|---|---|---|
| SQL-86 | 1986 | Initial standard |
| SQL-89 | 1989 | Integrity constraints |
| SQL-92 | 1992 | Joins, schema manipulation |
| SQL:1999 | 1999 | Triggers, recursive queries, procedural extensions |
| SQL:2003 | 2003 | Window functions, XML |
| SQL:2016 | 2016 | JSON support, pattern matching |
SQL is divided into distinct categories based on functionality:
Purpose: Query and modify data
| Statement | Operation | Description |
|---|---|---|
| SELECT | Query | Retrieve data from tables |
| INSERT | Create | Add new rows to a table |
| UPDATE | Modify | Change existing data |
| DELETE | Remove | Delete rows from a table |
Example:
-- Query data
SELECT name, salary FROM employees WHERE department = 'Engineering';
-- Insert data
INSERT INTO employees (id, name, department, salary)
VALUES (101, 'Alice', 'Engineering', 75000);
-- Update data
UPDATE employees SET salary = 80000 WHERE id = 101;
-- Delete data
DELETE FROM employees WHERE id = 101;Purpose: Define and modify database structure
| Statement | Operation | Description |
|---|---|---|
| CREATE | Define | Create databases, tables, indexes, views |
| ALTER | Modify | Change structure of existing objects |
| DROP | Remove | Delete database objects |
| TRUNCATE | Clear | Remove all rows from a table (faster than DELETE) |
Example:
-- Create table
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(50),
salary DECIMAL(10, 2)
);
-- Alter table
ALTER TABLE employees ADD COLUMN hire_date DATE;
-- Create index
CREATE INDEX idx_department ON employees(department);
-- Create view
CREATE VIEW engineering_staff AS
SELECT id, name, salary FROM employees WHERE department = 'Engineering';
-- Drop table
DROP TABLE employees;Purpose: Control access to data
| Statement | Operation | Description |
|---|---|---|
| GRANT | Allow | Give privileges to users |
| REVOKE | Deny | Remove privileges from users |
Example:
-- Grant privileges
GRANT SELECT, INSERT ON employees TO user_alice;
-- Revoke privileges
REVOKE INSERT ON employees FROM user_alice;
-- Grant all privileges
GRANT ALL PRIVILEGES ON DATABASE company TO admin_user;Purpose: Manage transactions
| Statement | Operation | Description |
|---|---|---|
| BEGIN/START | Start | Begin a transaction |
| COMMIT | Finalize | Save changes permanently |
| ROLLBACK | Undo | Revert changes to last commit |
| SAVEPOINT | Checkpoint | Set a point to rollback to |
Example:
-- Transaction example
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT; -- If successful
-- Or ROLLBACK if error occurredAggregate functions compute a single result from a set of input values.
| Function | Description | Handles NULL |
|---|---|---|
| COUNT(col) | Number of non-NULL values | Ignores NULL |
| COUNT(*) | Number of rows | Counts all rows |
| SUM(col) | Sum of values | Ignores NULL |
| AVG(col) | Average of values | Ignores NULL |
| MIN(col) | Minimum value | Ignores NULL |
| MAX(col) | Maximum value | Ignores NULL |
-- Count employees
SELECT COUNT(*) AS total_employees FROM employees;
-- Average salary
SELECT AVG(salary) AS avg_salary FROM employees;
-- Min and max salary
SELECT MIN(salary) AS min_sal, MAX(salary) AS max_sal FROM employees;
-- Sum of all salaries
SELECT SUM(salary) AS total_payroll FROM employees;Purpose: Select only unique values
-- Count unique departments
SELECT COUNT(DISTINCT department) AS num_departments FROM employees;
-- Get unique departments
SELECT DISTINCT department FROM employees;
-- Multiple columns (unique combinations)
SELECT DISTINCT department, job_title FROM employees;Purpose: Group rows with the same values for aggregation
Rule: Non-aggregated columns in SELECT must appear in GROUP BY
-- Employees per department
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department;
-- Average salary per department and job title
SELECT department, job_title, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title;
-- Total sales per year and quarter
SELECT YEAR(sale_date) AS year,
QUARTER(sale_date) AS quarter,
SUM(amount) AS total_sales
FROM sales
GROUP BY YEAR(sale_date), QUARTER(sale_date);Purpose: Filter groups based on aggregate conditions
Difference from WHERE:
- WHERE filters rows before grouping
- HAVING filters groups after aggregation
-- Departments with more than 10 employees
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
-- Departments with average salary above 70000
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;
-- Combine WHERE and HAVING
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01' -- Filter rows first
GROUP BY department
HAVING AVG(salary) > 60000; -- Filter groups secondSELECT column1, AGG_FUNC(column2) AS alias
FROM table
WHERE condition -- Filter rows before grouping
GROUP BY column1 -- Group rows
HAVING AGG_FUNC(column2) > value -- Filter groups after aggregation
ORDER BY column1 -- Sort results
LIMIT n; -- Limit number of resultsSQL provides rich string manipulation capabilities.
SQL Standard:
- Strings are case sensitive
- Use single quotes only:
'text' - Double quotes for identifiers (column/table names)
Vendor Variations:
- MySQL: Case insensitive by default, allows double quotes for strings
- PostgreSQL: Case sensitive, follows standard strictly
- SQL Server: Case sensitivity depends on collation
LIKE operator matches patterns in strings.
Wildcards:
- % : Matches any substring (including empty string)
- _ : Matches exactly one character
-- Names starting with 'A'
SELECT name FROM employees WHERE name LIKE 'A%';
-- Names ending with 'son'
SELECT name FROM employees WHERE name LIKE '%son';
-- Names containing 'ann'
SELECT name FROM employees WHERE name LIKE '%ann%';
-- Exactly 5 characters
SELECT name FROM employees WHERE name LIKE '_____';
-- Second letter is 'a'
SELECT name FROM employees WHERE name LIKE '_a%';
-- Pattern with both wildcards
SELECT email FROM users WHERE email LIKE '%@gmail.%';
-- Case insensitive (PostgreSQL)
SELECT name FROM employees WHERE name ILIKE 'alice%';Operator: || (SQL standard)
Vendor Alternatives:
- MySQL:
CONCAT()function - SQL Server:
+operator
-- Standard concatenation
SELECT first_name || ' ' || last_name AS full_name FROM employees;
-- With NULL handling (COALESCE)
SELECT first_name || ' ' || COALESCE(middle_name || ' ', '') || last_name
AS full_name FROM employees;
-- MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
-- SQL Server
SELECT first_name + ' ' + last_name AS full_name FROM employees;| Function | Description | Example |
|---|---|---|
| SUBSTRING(S, B, E) | Extract substring from position B, length E | SUBSTRING('Hello', 1, 3) → 'Hel' |
| UPPER(S) | Convert to uppercase | UPPER('hello') → 'HELLO' |
| LOWER(S) | Convert to lowercase | LOWER('HELLO') → 'hello' |
| LENGTH(S) | String length | LENGTH('hello') → 5 |
| TRIM(S) | Remove leading/trailing spaces | TRIM(' hello ') → 'hello' |
| REPLACE(S, F, R) | Replace F with R in S | REPLACE('hello', 'l', 'x') → 'hexxo' |
Examples:
-- Extract domain from email
SELECT SUBSTRING(email, POSITION('@' IN email) + 1) AS domain
FROM users;
-- Standardize names to title case
SELECT UPPER(SUBSTRING(name, 1, 1)) || LOWER(SUBSTRING(name, 2)) AS proper_name
FROM employees;
-- Find string length
SELECT name, LENGTH(name) AS name_length FROM employees;
-- Remove extra spaces
SELECT TRIM(address) AS clean_address FROM customers;Purpose: Sort query results
Syntax:
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...- ASC: Ascending order (default)
- DESC: Descending order
Examples:
-- Sort by salary ascending
SELECT name, salary FROM employees ORDER BY salary;
-- Sort by salary descending
SELECT name, salary FROM employees ORDER BY salary DESC;
-- Multiple columns
SELECT department, name, salary
FROM employees
ORDER BY department ASC, salary DESC;
-- Order by expression
SELECT name, salary, salary * 1.1 AS projected_salary
FROM employees
ORDER BY salary * 1.1 DESC;
-- Order by column position (not recommended)
SELECT name, salary FROM employees ORDER BY 2 DESC;
-- NULL handling
SELECT name, bonus FROM employees ORDER BY bonus NULLS LAST;Purpose: Restrict the number of rows returned
Syntax:
LIMIT n [OFFSET m]- LIMIT n: Return at most n rows
- OFFSET m: Skip first m rows
Examples:
-- Top 5 highest paid employees
SELECT name, salary FROM employees
ORDER BY salary DESC
LIMIT 5;
-- Pagination: rows 11-20
SELECT name, salary FROM employees
ORDER BY name
LIMIT 10 OFFSET 10;
-- Second page of 25 records
SELECT * FROM products
ORDER BY product_id
LIMIT 25 OFFSET 25;Vendor Variations:
-- SQL Server / Access
SELECT TOP 5 name, salary FROM employees ORDER BY salary DESC;
-- Oracle
SELECT * FROM (
SELECT name, salary FROM employees ORDER BY salary DESC
) WHERE ROWNUM <= 5;
-- SQL Server (modern)
SELECT name, salary FROM employees
ORDER BY salary DESC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;Nested queries (subqueries) are queries within queries, enabling complex data retrieval.
| Operator | Description | Returns True When |
|---|---|---|
| IN | Equal to any value | Value matches any in subquery (equivalent to = ANY) |
| NOT IN | Not equal to any value | Value doesn't match any in subquery |
| EXISTS | Subquery returns rows | At least one row returned |
| NOT EXISTS | Subquery returns no rows | No rows returned |
| ANY | Compare to any value | Condition true for at least one row |
| ALL | Compare to all values | Condition true for all rows |
-- Employees in specific departments
SELECT name, department FROM employees
WHERE department IN ('Engineering', 'Sales', 'Marketing');
-- Employees in departments with more than 50 people (subquery)
SELECT name, department FROM employees
WHERE department IN (
SELECT department FROM employees
GROUP BY department
HAVING COUNT(*) > 50
);
-- NOT IN
SELECT name FROM employees
WHERE department NOT IN ('HR', 'Admin');Characteristics:
- Returns TRUE/FALSE (not data)
- Stops as soon as match is found (efficient)
- Better performance than IN for large datasets
-- Departments that have employees
SELECT department_name FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id
);
-- Employees who have made sales
SELECT name FROM employees e
WHERE EXISTS (
SELECT 1 FROM sales s
WHERE s.employee_id = e.id
);
-- NOT EXISTS (employees with no sales)
SELECT name FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM sales s
WHERE s.employee_id = e.id
);Syntax: value operator ANY (subquery)
Operators: =, <>, <, <=, >, >=
-- Salary greater than any salary in HR department
SELECT name, salary FROM employees
WHERE salary > ANY (
SELECT salary FROM employees
WHERE department = 'HR'
);
-- Equal to any (equivalent to IN)
SELECT name FROM employees
WHERE department = ANY (
SELECT department FROM departments WHERE location = 'New York'
);Syntax: value operator ALL (subquery)
-- Salary greater than all salaries in HR department
SELECT name, salary FROM employees
WHERE salary > ALL (
SELECT salary FROM employees
WHERE department = 'HR'
);
-- Not equal to all (equivalent to NOT IN)
SELECT name FROM employees
WHERE department <> ALL ('HR', 'Admin');Return single value (one row, one column)
-- Employees earning more than average
SELECT name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Salary difference from average
SELECT name,
salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;Subquery references outer query (executed once per row)
-- Employees earning more than their department average
SELECT e1.name, e1.department, e1.salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
);
-- Products with above-average price in their category
SELECT p1.product_name, p1.category, p1.price
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category
);Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY.
- Window: Set of rows related to current row
- Partition: Divide rows into groups
- Order: Define sequence within partition
- Frame: Subset of partition relative to current row
| Function | Description | Handling Ties |
|---|---|---|
| ROW_NUMBER() | Sequential number | Different numbers for ties |
| RANK() | Rank with gaps | Same rank, skip next numbers |
| DENSE_RANK() | Rank without gaps | Same rank, continue sequence |
| NTILE(n) | Divide into n buckets | Distribute as evenly as possible |
Examples:
-- Row number (always unique)
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;
-- Rank with gaps
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;
-- Salary 100k, 100k, 90k → Ranks: 1, 1, 3
-- Dense rank without gaps
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- Salary 100k, 100k, 90k → Ranks: 1, 1, 2
-- Quartiles
SELECT name, salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;Purpose: Divide result set into partitions for separate calculations
-- Rank within each department
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
-- Row number per department
SELECT name, department, hire_date,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY hire_date) AS hire_order
FROM employees;
-- Highest paid in each department
WITH ranked AS (
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rank = 1;Difference from GROUP BY:
- Window functions keep all rows
- GROUP BY collapses rows
-- Running total
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;
-- Average by department (without collapsing rows)
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;
-- Difference from department average
SELECT name, department, salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;| Function | Description |
|---|---|
| LAG(col, n) | Value from n rows before |
| LEAD(col, n) | Value from n rows after |
| FIRST_VALUE(col) | First value in window |
| LAST_VALUE(col) | Last value in window |
-- Previous and next salary
SELECT name, salary,
LAG(salary, 1) OVER (ORDER BY salary) AS prev_salary,
LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary
FROM employees;
-- Month-over-month change
SELECT month, revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS monthly_change
FROM monthly_revenue;
-- First and last in partition
SELECT name, department, salary,
FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid,
LAST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS lowest_paid
FROM employees;Syntax:
OVER (
PARTITION BY column
ORDER BY column
ROWS|RANGE BETWEEN frame_start AND frame_end
)Frame Bounds:
UNBOUNDED PRECEDING: Start of partitionn PRECEDING: n rows before currentCURRENT ROW: Current rown FOLLOWING: n rows after currentUNBOUNDED FOLLOWING: End of partition
-- Moving average (3-row window)
SELECT date, amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM sales;
-- Cumulative sum
SELECT date, amount,
SUM(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sum
FROM sales;Common Table Expressions (CTEs) create temporary named result sets that exist only during query execution.
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;-- Department averages
WITH dept_avg AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
)
SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN dept_avg d ON e.department = d.department
WHERE e.salary > d.avg_salary;
-- Multiple CTEs
WITH
sales_summary AS (
SELECT product_id, SUM(quantity) AS total_sold
FROM sales
GROUP BY product_id
),
inventory_summary AS (
SELECT product_id, SUM(quantity) AS total_stock
FROM inventory
GROUP BY product_id
)
SELECT p.product_name,
s.total_sold,
i.total_stock,
i.total_stock - s.total_sold AS remaining
FROM products p
JOIN sales_summary s ON p.id = s.product_id
JOIN inventory_summary i ON p.id = i.product_id;| Advantage | Description |
|---|---|
| Readability | Break complex queries into logical steps |
| Reusability | Reference CTE multiple times in same query |
| Recursion | Enable recursive queries |
| Maintenance | Easier to debug and modify |
Purpose: Query hierarchical or tree-structured data
Syntax:
WITH RECURSIVE cte_name AS (
-- Base case (anchor member)
SELECT ...
UNION ALL
-- Recursive case (recursive member)
SELECT ... FROM cte_name WHERE ...
)
SELECT * FROM cte_name;Examples:
Organization Hierarchy:
-- Employee hierarchy
WITH RECURSIVE employee_hierarchy AS (
-- Base: Top-level employees (no manager)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Employees managed by previous level
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;Number Series:
-- Generate numbers 1 to 10
WITH RECURSIVE numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT * FROM numbers;Path Finding:
-- Find all paths in a graph
WITH RECURSIVE paths AS (
-- Base: Direct connections from starting node
SELECT source, target, ARRAY[source, target] AS path
FROM edges
WHERE source = 'A'
UNION ALL
-- Recursive: Extend paths
SELECT p.source, e.target, p.path || e.target
FROM paths p
JOIN edges e ON p.target = e.source
WHERE NOT e.target = ANY(p.path) -- Avoid cycles
)
SELECT * FROM paths;Date Series:
-- Generate date range
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' AS date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < DATE '2024-01-31'
)
SELECT * FROM date_series;Course Materials:
- CS 6400: Database Systems Concepts and Design - Georgia Tech OMSCS