Database Basics #3 - SQL Statements Cont'd

In the previous article, we covered fundamental SQL statements to perform CRUD operations on a single table. However, SQL allows us to perform more complex operations involving multiple tables. Therefore, in this article, we will continue the discussion on SQL statements to perform those complex operations.

Aggregate Functions

Aggregate functions are functions that perform calculations on a set of tuples and return a single value. These include MIN(), MAX(), COUNT(), SUM(), and AVG(). They can be used with SELECT statements, such as SELECT COUNT(*) FROM Product; to count the number of tuples, and SELECT AVG(price) FROM Product; to compute the average price of the products. In PostgreSQL, default attribute names are not always descriptive, like count and avg; therefore, we often use aliases, such as SELECT COUNT(*) AS product_count ... and SELECT AVG(price) AS avg_price ....

We can also specify the column to obtain aggregated results for using GROUP BY. For example, if we have product categories (electronics, books, etc.), we can obtain the average price for each category using SELECT category, AVG(price) FROM Product GROUP BY category;. Similarly, we can obtain the number of products added each year using SELECT year, COUNT(*) FROM Product GROUP BY year;. We cannot use WHERE to specify conditions with aggregate functions. Instead, we use HAVING. For example, SELECT year, COUNT(*) From Product GROUP BY year HAVING category ='Books' AND COUNT(*) > 5 ORDER BY COUNT(*) DESC; counts the number of products in the 'Books' category added each year and displays only the years with a product count exceeding 5, sorted in descending order by count.

Foreign Keys

Foreign key constraints allow us to express relationships between different relations by restricting an attribute of a relation to be chosen from (either the primary key or a unique attribute) of another relation. For example, we can express a one-to-many relationship between the Customer and Order relations (a customer can have multiple orders, while each order has only one customer) by creating a customer_id attribute with a foreign key constraint in the Order table, limiting its values to those present in the id attribute of the Customer table. In PostgreSQL, this can be achieved with CREATE TABLE Order(..., customer_id INT REFERENCES Customer (id));.

Due to this constraint, inserting a new tuple with a customer_id not present in the id attribute values of the Customer table becomes invalid. We can also express a many-to-many relationship between the Product and Order relations (an order can have many products, and a product can be in many orders) by creating a new table, OrderItems, with two attributes, order_id and product_id, each with a foreign key constraint using CREATE TABLE OrderItems(id SERIAL NOT NULL PRIMARY KEY, order_id INT REFERENCES Order (id), product_id REFERENCES Product (id));.

By default, a customer cannot be deleted unless there are no orders associated with that customer's ID. However, we can configure all orders associated with a customer's ID to be deleted when the customer is deleted by adding ON DELETE CASCADE in the foreign key definition. It is also appropriate to set ON DELETE CASCADE for order_id in the OrderItems table, since Order and OrderItems have a "has-a" relationship. However, it is not appropriate to delete all orders associated with a product when deleting that product, since Product and OrderItems have an "is-in" relationship. In such cases, ON DELETE SET NULL or ON DELETE SET DEFAULT are more appropriate, as they retain the rows by setting the values to either null or a default value, respectively.

Joins

When referencing tuples in another table using a foreign key and wanting to retrieve the attributes of those tuples, we can join the tables based on the foreign key. For example, we might want to know a customer's name from the customer ID in an order to create a proper receipt. In this case, we can join the Order and Customer tables to create a new table with relevant attributes (order ID and customer's name) like SELECT Order.id, Customer.name FROM Order JOIN Customer ON Order.customer_id = Customer.id;. Then, we can use WHERE to specify the order ID to create the receipt for.

We do not necessarily have to use foreign keys to join tables, although it is rare to see such joins. The default join is an inner join, which removes all rows with non-matching attribute values, a situation that can occur when joining tables based on an attribute that is not a foreign key. Alternatively, we can use a left join, which leaves tuples with non-matching attribute values from the left table (the Order table in the above example, potentially making some customer names null), a right join, which leaves tuples with non-matching attribute values from the right table (the Customer table in the above example, potentially making some order IDs null), or a full join, which leaves all tuples with non-matching attribute values.

Subqueries

When we want to display an aggregated value as a new column in a complete table, we can do so with subqueries, which involve including a query inside another query. For example, we can display the price difference from the average for each product with SELECT name, price - (SELECT AVG(price) FROM Product) FROM Product;. We can also use a subquery to filter out products whose prices are lower than the average, like SELECT name, price FROM Product WHERE price >= (SELECT AVG(price) FROM Product);.

The use cases of subqueries aren't limited to aggregated values. For example, we can obtain a list of products in an order using a subquery, like SELECT name, price FROM Product WHERE id IN (SELECT product_id FROM OrderItems WHERE order_id = 1);. We could display the total in the last row by adding another query with UNION, such as ...UNION SELECT "Total", SUM(price) FROM Product WHERE id IN (SELECT product_id FROM OrderItems WHERE order_id = 1);. In this example, however, we can achieve the same result using a join, which is generally the preferred approach. However, there are cases where a subquery is unavoidable for complex queries, such as SELECT p.name, p.price, Order.date FROM (SELECT OrderItems.order_id, name, price FROM Product JOIN OrderItems ON Product.id = OrderItems.product_id) AS p JOIN Order ON p.order_id = Order.id;.

Window Functions

We used subqueries to display aggregate values as a column previously, but we can achieve the same thing with window functions as well. For example, the previous query for displaying the price difference can be refactored as SELECT name, price - AVG(price) OVER () FROM Product;. Window functions also allow for much more flexibility in how we apply aggregate functions. For example, we can compute aggregated values for each category, similar to GROUP BY, and display them as a new column using PARTITION BY like SELECT name, AVG(price) OVER (PARTITION BY category) FROM Product;, which displays the average prices of each category and displays them for each product.

We can also compute running aggregated values by using ORDER BY such as SELECT name, AVG(price) OVER (PARTITION BY category ORDER BY price) FROM Product;, which computes the running average price of products in each category in ascending order (we can use DESC to switch it to descending order). To share the same windows across different columns, we can use the WINDOW clause like SELECT name, AVG(price) OVER w, AVG(quantity) OVER w FROM Product WINDOW w AS (PARTITION BY category);.

Conclusion

This article has introduced aggregate functions, foreign key constraints, table joins, subqueries, and window functions (along with other clauses and statements) in PostgreSQL, which enable us to perform complex operations, primarily queries, on multiple related tables. For more detailed descriptions of the available statements for a DBMS, I recommend checking the corresponding official documentation.

Resources

PostgreSQL. n.d. PostgreSQL 17.4 Documentation. PostgreSQL.