Database Basics #2 - SQL Statements

In the last article, we briefly discussed DBMS, the relational data model, and SQL, and set up an environment to test PostgreSQL with Docker. In this article, we will use that environment to learn the basic SQL statements for interacting with the database and performing CRUD (Create, Read, Update, and Delete) operations.

Creating Databases & Tables

When accessing PostgreSQL with psql -U postgres, we are accessing the database as the postgres user, which is a default superuser. We can check the users with SELECT * FROM pg_user; and confirm that the postgres user has permissions for all actions by default. (We will discuss users and roles in a future article.) By copying the predefined template databases (initialized to be empty), we can create a new database with CREATE DATABASE testdb;. We write command keywords in capital letters by convention for clarity.

After creating the database, we can use the \l psql command to list all the databases and confirm that testdb is created below the template databases. By default, it copies the template1 database to create a new database, but we can specify the database to copy from with TEMPLATE <db_name>. It is defaulted to use the postgres database, so we can switch the database with the \c testdb psql command. Inside the database, we can define a table (or relation) with a schema using the CREATE TABLE <table_name> (<column1> <type>, <column2> <type>, ...); statement. (Certain names, like user, are not allowed in some DBMS, including PostgreSQL.)

As a practice, we can create a t1 table with a primary key that automatically gets incremented, a name with a varchar(50) type, and an age with an int type using CREATE TABLE t1 (id SERIAL NOT NULL PRIMARY KEY, name VARCHAR(50), age INT);. (The primary key with auto-increment can be implemented using the SERIAL data type with PRIMARY KEY in PostgreSQL. We can alternatively place PRIMARY KEY (id) at the end, and other DBMS use that with an AUTO_INCREMENT keyword like id INT NOT NULL AUTO_INCREMENT.) Using the \dt psql command (or \dt+ for showing the size) and the \d t1 command, we can see all the tables in the database and confirm the creation of t1 and observe the structure of the table, respectively.

Constraints & Tuple Insertions

The NOT NULL keyword used above is one type of constraint that restricts id from being NULL or missing. There are other useful constraints, such as DEFAULT, UNIQUE, and CHECK. The DEFAULT constraint sets a default value for the attribute, which is particularly useful for creating a timestamp like createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP. The UNIQUE constraint ensures that there are no duplicates, which is useful for checking duplicate emails like email VARCHAR(50) UNIQUE. The CHECK constraint is useful for checking the validity of attribute values, such as CHECK (age >= 18 AND char_length(password) >= 8). There is another useful constraint, FOREIGN KEY, which will be covered in the next article.

After creating a table (relation) based on a schema defining constraints, we can start inserting rows (tuples) into it with INSERT INTO <table_name> (<col1_name>, <col2_name>, ...) VALUES (<col1_val>, <col2_val>, ...), (<col1_val>, <col2_val>, ...), ...;. We can use this statement to insert a single row or multiple rows at the same time. The auto-incremented primary key and defaulted values do not need to be specified when inserting rows, meaning we can insert rows like INSERT INTO t1 (name, age) VALUES ('A', 18), ('B', 20); in t1. Here, it's important to note that double quotes are for table or column names and single quotes are for string values in the SQL standard.

Reading / Querying

Once we've created a table with some rows, we can start reading or querying it. To query all the rows in a table, we use SELECT * FROM <table-name>;, where * represents all columns. This displays a table in the terminal. We can use clauses like LIMIT <number> and OFFSET <number> to query a limited number of rows, starting from a specific offset. For example, SELECT * FROM t1 LIMIT 3 OFFSET 1; queries only 3 rows, starting from the second row of the t1 table.

postgres=# SELECT * FROM t1; 
 id | name | age
----+------+-----
  1 | A    |  18
  2 | B    |  20
  3 | C    |  27
  4 | D    |  25
  5 | E    |  19
(5 rows)
 
postgres=# SELECT id, name FROM t1 ORDER BY id DESC LIMIT 3 OFFSET 1;
 id | name 
----+------
  4 | D
  3 | C
  2 | B
(3 rows)
 
postgres=# SELECT id, name, age FROM t1 ORDER BY age DESC WHERE age >= 20;
 id | name | age
----+------+-----
  3 | C    |  27
  4 | D    |  25
  2 | B    |  20
(3 rows)

We can also specify the column used to determine the order in which the rows are presented using ORDER BY <col_name>, like SELECT * FROM t1 ORDER BY id;. By default, the ordering is ascending, but we can display rows in descending order by adding DESC to the clause. Rows can be filtered using WHERE <condition>. These conditions can be defined with operators, including comparison operators like =, !=, and >, and logical operators like AND, OR, NOT, IS, and so on. (For example, we can express conditions like age >= 20 AND age <= 25 or BETWEEN 20 AND 25, IS NOT NULL, etc.). The above examples demonstrate the queries we're discussing.

We can also filter rows based on string attributes using wildcard characters like % and _. The % represents any sequence of characters of any length, and _ represents any single character. For example, we can search for rows of books containing "database" in any part of their titles using WHERE title LIKE '%database%', or search for books published in January using WHERE CAST(publishedAt AS VARCHAR) LIKE '____-01-%', which are both realistic queries from an application (although there might be other ways to achieve the same result).

Updation & Deletion

We can update the attributes of all rows in a table using UPDATE <table-name> SET <col_name>=<col_value>, ...;. We can further specify the rows to update using primary keys and other conditions, like UPDATE t1 SET age=26 WHERE name='D', which updates D's age from 25 to 26. We can also use arithmetic operators, like SET age=age+1, for more reusable statements. We can also manipulate columns using ALTER TABLE. For example, we can add a new column like ALTER TABLE t1 ADD email VARCHAR(255); and modify the type of a column like ALTER TABLE t1 ALTER COLUMN age TYPE VARCHAR(3); or ALTER COLUMN name TYPE VARCHAR(100);.

Using DELETE FROM <table-name>;, we can delete all rows in the table. We can also delete an entire column with ALTER TABLE <table-name> DROP COLUMN <col_name>;. To delete specific rows, we can add conditions like WHERE name='D' or WHERE age <= 18. To drop an entire table and database, we use DROP TABLE <table-name> and DROP DATABASE <database_name>, respectively. However, since template databases are used to create new databases, we cannot drop them using the DROP DATABASE statement.

Conclusion

In this article, we covered the basic SQL statements for performing CRUD operations on a table and some commands in PostgreSQL. While the commands, syntax of statements, and their corresponding behaviors are similar across different relational DBMS, it's important to be mindful of the differences. In the next article, we will dive deeper into SQL and cover more complex queries.

Resources

Neon. n.d. PostgreSQL Administration. PostgreSQL Tutorial.
PostgreSQL. n.d. PostgreSQL 17.4 Documentation. PostgreSQL.