This blog post continues the discussions on SQL statements.

In the previous article, we covered fundamental SQL statements to perform CRUD operations on a single table. However, SQL allows us to perform more complex operations involving multiple tables. Therefore, in this article, we will continue the discussion on SQL statements to perform those complex operations.
Aggregate Functions
Aggregate functions are functions that perform calculations on a set of tuples and return a single value.
These include MIN()
, MAX()
, COUNT()
, SUM()
, and AVG()
. They can be used with SELECT
statements,
such as SELECT COUNT(*) FROM Product;
to count the number of tuples, and SELECT AVG(price) FROM Product;
to compute the average price of the products. In PostgreSQL, default attribute names are not always descriptive,
like count
and avg
; therefore, we often use aliases, such as SELECT COUNT(*) AS product_count ...
and
SELECT AVG(price) AS avg_price ...
.
We can also specify the column to obtain aggregated results for using GROUP BY
. For example,
if we have product categories (electronics, books, etc.), we can obtain the average price for
each category using SELECT category, AVG(price) FROM Product GROUP BY category;
. Similarly,
we can obtain the number of products added each year using SELECT year, COUNT(*) FROM Product GROUP BY year;
.
We cannot use WHERE
to specify conditions with aggregate functions. Instead, we use HAVING
.
For example, SELECT year, COUNT(*) From Product GROUP BY year HAVING category ='Books' AND COUNT(*) > 5 ORDER BY COUNT(*) DESC;
counts the number of products in the 'Books' category added each year and displays only the years
with a product count exceeding 5, sorted in descending order by count.
Foreign Keys
Foreign key constraints allow us to express relationships between different relations by restricting
an attribute of a relation to be chosen from (either the primary key or a unique attribute) of another relation.
For example, we can express a one-to-many relationship between the Customer
and Order
relations
(a customer can have multiple orders, while each order has only one customer) by creating a customer_id
attribute with a foreign key constraint in the Order
table, limiting its values to those present in the id
attribute of the Customer
table. In PostgreSQL, this can be achieved with
CREATE TABLE Order(..., customer_id INT REFERENCES Customer (id));
.

Due to this constraint, inserting a new tuple with a customer_id
not present in the id
attribute values
of the Customer
table becomes invalid. We can also express a many-to-many relationship between
the Product
and Order
relations (an order can have many products, and a product can be in many orders)
by creating a new table, OrderItems
, with two attributes, order_id
and product_id
, each with
a foreign key constraint using CREATE TABLE OrderItems(id SERIAL NOT NULL PRIMARY KEY, order_id INT REFERENCES Order (id), product_id REFERENCES Product (id));
.
By default, a customer cannot be deleted unless there are no orders associated with that customer's ID.
However, we can configure all orders associated with a customer's ID to be deleted when the customer is deleted
by adding ON DELETE CASCADE
in the foreign key definition. It is also appropriate to set ON DELETE CASCADE
for order_id
in the OrderItems
table, since Order
and OrderItems
have a "has-a" relationship.
However, it is not appropriate to delete all orders associated with a product when deleting that product,
since Product
and OrderItems
have an "is-in" relationship. In such cases, ON DELETE SET NULL
or
ON DELETE SET DEFAULT
are more appropriate, as they retain the rows by setting the values to
either null or a default value, respectively.
Joins
When referencing tuples in another table using a foreign key and wanting to retrieve the attributes of those tuples,
we can join the tables based on the foreign key. For example, we might want to know a customer's name from
the customer ID in an order to create a proper receipt. In this case, we can join the Order
and Customer
tables
to create a new table with relevant attributes (order ID and customer's name) like
SELECT Order.id, Customer.name FROM Order JOIN Customer ON Order.customer_id = Customer.id;
.
Then, we can use WHERE
to specify the order ID to create the receipt for.
We do not necessarily have to use foreign keys to join tables, although it is rare to see such joins.
The default join is an inner join, which removes all rows with non-matching attribute values,
a situation that can occur when joining tables based on an attribute that is not a foreign key.
Alternatively, we can use a left join, which leaves tuples with non-matching attribute values
from the left table (the Order
table in the above example, potentially making some customer names null),
a right join, which leaves tuples with non-matching attribute values from the right table
(the Customer
table in the above example, potentially making some order IDs null), or a full join,
which leaves all tuples with non-matching attribute values.
Subqueries
When we want to display an aggregated value as a new column in a complete table, we can do so with subqueries,
which involve including a query inside another query. For example, we can display the price difference from
the average for each product with SELECT name, price - (SELECT AVG(price) FROM Product) FROM Product;
.
We can also use a subquery to filter out products whose prices are lower than the average,
like SELECT name, price FROM Product WHERE price >= (SELECT AVG(price) FROM Product);
.
The use cases of subqueries aren't limited to aggregated values. For example,
we can obtain a list of products in an order using a subquery,
like SELECT name, price FROM Product WHERE id IN (SELECT product_id FROM OrderItems WHERE order_id = 1);
.
We could display the total in the last row by adding another query with UNION
,
such as ...UNION SELECT "Total", SUM(price) FROM Product WHERE id IN (SELECT product_id FROM OrderItems WHERE order_id = 1);
.
In this example, however, we can achieve the same result using a join, which is generally the preferred approach.
However, there are cases where a subquery is unavoidable for complex queries,
such as SELECT p.name, p.price, Order.date FROM (SELECT OrderItems.order_id, name, price FROM Product JOIN OrderItems ON Product.id = OrderItems.product_id) AS p JOIN Order ON p.order_id = Order.id;
.
Window Functions
We used subqueries to display aggregate values as a column previously, but we can achieve the same thing
with window functions as well. For example, the previous query for displaying the price difference can be
refactored as SELECT name, price - AVG(price) OVER () FROM Product;
. Window functions also allow for much
more flexibility in how we apply aggregate functions. For example, we can compute aggregated values for each category,
similar to GROUP BY
, and display them as a new column using PARTITION BY
like
SELECT name, AVG(price) OVER (PARTITION BY category) FROM Product;
, which displays the average prices of each category
and displays them for each product.
We can also compute running aggregated values by using ORDER BY
such as SELECT name, AVG(price) OVER (PARTITION BY category ORDER BY price) FROM Product;
,
which computes the running average price of products in each category in ascending order (we can use DESC
to switch it to descending order). To share the same windows across different columns, we can use the WINDOW
clause
like SELECT name, AVG(price) OVER w, AVG(quantity) OVER w FROM Product WINDOW w AS (PARTITION BY category);
.
Conclusion
This article has introduced aggregate functions, foreign key constraints, table joins, subqueries, and window functions (along with other clauses and statements) in PostgreSQL, which enable us to perform complex operations, primarily queries, on multiple related tables. For more detailed descriptions of the available statements for a DBMS, I recommend checking the corresponding official documentation.
Resources
- PostgreSQL. n.d. PostgreSQL 17.4 Documentation. PostgreSQL.