Database Basics #4 - Reliability & Security

This blog post introduces functionalities for ensuring reliability and security of database systems.

In the previous two articles, we have covered fundamental SQL statements for performing CRUD operations. DBMSs implement more functionalities for users to perform operations while maintaining reliability and security. Therefore, in this article, we will introduce some of those functionalities and their usage.

Transactions

In general, there are operations (usually writes) that must happen together. For example, if a user transfers money to another user, both users' balances need to be updated simultaneously. If the server crashes during the process, we could end up with money vanishing from the sender's account or being generated out of thin air, depending on the order, which are both critical issues. To prevent these issues, which carry serious financial and legal risks, DBMSs offer transactions that allow us to bundle these operations together into one atomic unit.

BEGIN;
UPDATE users SET balance = balance - 250.00 WHERE name = 'A';
UPDATE users SET balance = balance + 250.00 WHERE name = 'B';
COMMIT;

The above is an example transaction in PostgreSQL that bundles write operations for a money transfer of $250 from A to B. The BEGIN statement initiates the transaction, preventing the following statements from being immediately processed until a COMMIT statement appears, effectively bundling the statements between them. If a failure occurs during the processing of statements within a transaction, the DBMS rolls back all changes made, ensuring atomicity and consistency.

DBMSs implement transactions in a way that makes them invisible to other transactions, behaving as if the concurrent commits were made sequentially, enforcing isolation. Additionally, DBMSs ensure that transactions are recorded in non-volatile memory once they are committed, guaranteeing durability. The properties of atomicity, consistency, isolation, and durability are collectively known as the ACID property, and it is desirable for database systems to possess this property when prioritizing reliability (some systems might prioritize availability over consistency).

Triggers

In some cases, a column's value depends on other columns. For example, revenue per product, as a column, depends on a product's price and quantity sold, as revenue is calculated by multiplying them. Instead of manually multiplying these values to keep track of revenue for every update, which is error-prone, we can use a trigger that automatically computes revenue before each update, ensuring data integrity. We can set up such a trigger like CREATE TRIGGER revenue_update BEFORE UPDATE ON Product FOR EACH ROW SET NEW.revenue = price * quantity;.

We can also automatically compute revenue when inserting new rows by setting up a trigger like CREATE TRIGGER revenue_insert BEFORE INSERT ON Product FOR EACH ROW SET NEW.revenue = NEW.price * NEW.quantity;. Here, we can insert a NULL value for the revenue column, as the trigger automatically handles it. When tracking a company's total revenue, we can use a trigger like CREATE TRIGGER revenue_total_update AFTER INSERT ON Product FOR EACH ROW UPDATE Company SET revenue_total = revenue_total + (NEW.revenue - OLD.revenue) WHERE id == NEW.id;.

SQL Injections

Security is a major concern with database systems, and SQL injection is a significant vulnerability. A common example is when a malicious attacker enters ' OR '1'='1 in a password input field, resulting in a SQL statement like SELECT * FROM Customer WHERE name='admin' AND password='' OR '1'='1';, which always evaluates to true and exposes data. An attacker could also enter admin' -- to comment out the password check from the query.

These examples are relatively simple to prevent (through input validation and sanitization, such as checking for = and --), but more complex SQL injections exist that are difficult to address manually. Therefore, it's important to implement multiple measures to guard against a wide range of SQL injection attacks. Aside from input validations and sanitizations using regular expressions, we can use parameterized queries, which clearly separate SQL commands and user inputs. This can be implemented with stored procedures and prepared statements.

Stored Procedures

When setting up a database system, we often have a good idea of the queries that will be made frequently. For example, we can expect many queries like SELECT id, name, price FROM Product; for displaying product offerings on an e-commerce website. We can also anticipate clients making sets of queries like SELECT id, name, price, description, (AVG(score) FROM Review WHERE Review.product_id=<id>) FROM Product WHERE Product.id=<id>; and SELECT Customer.name, Review.score, Review.content FROM Review JOIN Customer ON Review.customer_id = Customer.id WHERE Review.product_id=<id>; for product details and reviews.

CREATE PROCEDURE product_details(pid integer) 
LANGUAGE SQL -- can be sql, c, internal, and others
BEGIN ATOMIC
    SELECT id, name, price, description, (AVG(score) FROM Review WHERE Review.product_id=pid) FROM Product WHERE Product.id=pid; -- product details and average score (could have been automatically computed with triggers)
    SELECT Customer.name, Review.score, Review.content FROM Review JOIN Customer ON Review.customer_id = Cutomer.id WHERE Review.product_id=pid; -- reviews
END;

In such cases, we can precompile these queries and store them on the server, allowing multiple clients to simply call the stored procedure rather than writing their own queries manually each time. This reduces the potential for errors. We can use CREATE PROCEDURE as shown above for complex queries for obtaining product details and reviews. Clients only need to use CALL product_details(<id>); to execute the complex query. Stored procedures can also be used to define parameterized queries, guarding against potential SQL injections (although this may not be as relevant in this specific scenario). They can also limit which clients are allowed to call them, enhancing security. (This assumes a client-server database model, so it's often not supported for DBMSs with other models like SQLite, which is common for embedded databases.)

Prepared Statements

Prepared statements allow clients to set up parameterized queries to guard against SQL injections. For example, when a client authenticates, they can set up a prepared statement like PREPARE authenticate(text, text) AS SELECT * FROM Customer WHERE name=$1 AND password=$2;, which they can then execute with EXECUTE authenticate(<name>, <password>);. Once defined by the client in a database session, the same prepared statement can be reused and executed with different sets of parameters by the same client within that session, potentially improving query performance depending on the query complexity and the number of queries.

Prepared statements also contribute to reliability due to their reusability. While similar to stored procedures, prepared statements are created by the clients, not the server, and exist only for a single database session. Also, they typically cannot bundle multiple queries and are available for embedded databases like SQLite, unlike stored procedures. By combining prepared statements, stored procedures, and user input validations and sanitizations, we can improve security against SQL injection attacks.

Conclusion

This article covered transactions, triggers, stored procedures, and prepared statements, all of which contribute to the reliability and security of operations on SQL databases, which are critical aspects in almost all use cases. For more details regarding these functionalities, we recommend checking the official documentation of the respective DBMS (cited below for PostgreSQL).

Resources

PostgreSQL. n.d. PostgreSQL 17.4 Documentation. PostgreSQL.