In this article, we aim to cover potential concepts an interviewee may encounter during an interview. This article will cover some general SQL concepts, as well as some specific questions for SQLite and PostgreSQL.
General SQL Questions
What is SQL?
SQL stands for "Structured Query Language." It is the language standard used for database systems that store data in the form of rows and tables (otherwise known as relational databases).
What are the origins of SQL?
SQL was first developed by Donald Chamberlin and Raymond Boyce in 1974. Over time, their work evolved into a set of database management systems and language standards, which are updated every few years.
Explain how SQL language standards work.
With every SQL language standard, SQL continues to add more desired features, such as regular expression matching in SQL:1999 and XML features in SQL:2003. After every release of the standard, database management systems (i.e. PostgreSQL and SQLite) would need to evaluate ways to implement these desired features (or ignore them, depending on their limitations).
What are some differences between SQL and NoSQL databases?
SQL databases usually store data in the form of tables and rows - otherwise known as relations. By contrast, NoSQL databases may store data in the form of documents (i.e. MongoDB stores data in JSON-formatted documents). Unlike NoSQL databases, SQL databases follow a set language standard (SQL), though each SQL database's implementation of SQL may differ.
What are prepared statements?
Prepared statements are SQL statements that are compiled before they are executed. These kinds of statements are usually offered by most relational database management systems, though the syntax for writing prepared statements may differ from system to system.
What is ACID / ACID-compliant?
ACID stands for "Atomicity, Consistency, Isolation, and Durability." It is a standard by which database operations are judged for data integrity.
A database is ACID-compliant if it fulfills:
Atomicity is the principle that either all or none of a set of operations will take place - in other words, SQL does not allow "partial executions." Suppose multiple database operations were executed to create a table, then alter one of its column properties. A database that is ACID-compliant will guarantee that, if the operations were to be interrupted, then none of the statements would execute.
Consistency is the principle that any database operation should follow any constraints defined by the database system. Suppose, for instance, a table had a column with a UNIQUE constraint. An ACID-compliant database will reject the operation if an operation were to insert a duplicate record into the table.
Isolation is the principle that any database operation taking place will take place with a valid state or copy of the database. This helps, notably, to prevent write-write conflicts, where a conflict occurs between two processes trying to change the same row of data.
Durability is the principle that a database operation that finishes successfuly will be preserved even in the event of an unforseen event (i.e. power failure, etc).
What are some common SQL operators?
Common SQL operators include:
- Arithmetic Operators (+, -, *, /, %)
- Comparison Operators (=, <>, <, <=, >, >=)
- Logical Operators (&, |, ^)
What are SQL injection Attacks?
SQL injection attacks occur when an attacker sends user input that manipulates a SQL query. Usually, this happens when a query is dependent on user input, such as in the following Python code:
import sqlite3 from sqlite3 import connect from contextlib import closing from pathlib import Path path = Path("sample.db") # delete the existing database for sake of example if path.is_file(): path.unlink() # Set up the database stmt = """ CREATE TABLE users (name TEXT NOT NULL, password TEXT NOT NULL); """ with connect("sample.db") as conn: with closing(conn.cursor()) as cursor: cursor.execute(stmt) stmt = 'INSERT INTO users VALUES ("Ryan", "password");' cursor.execute(stmt) stmt = 'INSERT INTO users VALUES ("Sandra", "password2");' cursor.execute(stmt) name = input("Type in name: ") # assume there is a sample SQLITE3 database # called "sample.db" with a table "users" # and columns "name" (TEXT) and "password" (TEXT) with connect("sample.db") as conn: conn.row_factory = sqlite3.Row with closing(conn.cursor()) as cursor: cursor.execute("SELECT * FROM users WHERE name = '" + name + "';") rows = cursor.fetchall() for row in rows: print(row['name'], row['password'])
A malicious user may provide the following input, bypassing the WHERE clause:
John' OR '1'='1
This, in turn, would simply return all rows and all data from the table users, exposing sensitive and unauthorized information to the malicious user.
How can applications protect against SQL injection attacks?
One of the best ways to protect against SQL injection attacks is to make use of prepared statements. Prepared statements essentially "compile" SQL statements, checking whether the data provided matches the parameters desired by a SQL statements.
For instance, in the previous example, the execution of the SELECT statement can be modified as follows:
cursor.execute("SELECT * FROM users WHERE name = ?;", [name])
The same input that worked before now displays nothing, indicating that the SQL injection attack was successfully thwarted.
Provide some examples of database systems that support SQL querying.
Some examples include SQLite, PostgreSQL, MySQL, and MSSQL (Microsoft SQL).
What are indexes in SQL databases?
In SQL database systems, indexes are references to data in tables. Indexes can significantly optimize reading operations (i.e. SELECT with WHERE clause) at the cost of overhead and performance (i.e. during many INSERT operations). Different database systems will offer different support for database indexing.
What is SQLite?
SQLite is a lightweight, open-source relational database management system that manages databases as files. According to its documentation, it is more comparable to opening and modifying a file (akin to open() in Python).
Give a brief overview of the history of SQLite.
SQLite was first started on May 9th, 2000. Since then, SQLite has featured hundreds upon hundreds of commits, and has been designated as a recommended storage method by the Library of Congress.
What features does SQLite have?
- Embedded databases
- Lightweight library
- Serverless databases
To what extent does SQLite conform to the SQL standard?
While SQLite does not specify which features are compliant to the SQL standard, it does note that SQLite has traded some advanced features for portability and compactness. For instance, RIGHT OUTER JOIN, a feature supported by most other database systems, is not supported in SQLite.
What are the data types SQLite supports?
The SQLite Relational Database Management System comes with five storage classes:
- INTEGER - numeric data storing up to 8 bytes (64 bit integers)
- TEXT - equivalent to a "string" class in Python or Java
- BLOB - binary data
- REAL - floating-point number
- NULL - an absence of value
What are some operators offered by SQLite?
SQLite provides a host of operators, including:
- Comparison operators (<, <=, =, >=, >)
- Mathematical operators (+, -, *, /, %)
When should SQLite be used?
SQLite can be used in many contexts, most notably:
- Embedded applications
Because SQLite is lightweight and self-contained, it is ideal for storing application data. This may particularly come in handy with mobile phones, especially those with limited hardware resources. SQLite's website also states that the technology works well even in devices with low memory.
- Internal or Temporary Databases
According to the SQLite documentation, it may sometimes be more efficient to load data into an in-memory database and query said data through use of SQL operations. Doing so can prevent unnecessary complications with writing manual code that parses data, reads data, queries data, etc.
- Training purposes
Any beginner with little to no experience in database programming may find SQLite to be a good introduction to SQL relational databases. This, in turn, can ease the transition from SQLite to other database management systems, such as MySQL or PostgreSQL.
When should SQLite not be used?
- High-Traffic Websites
High-traffic websites involving databases may require may write or read operations, which may require optimization by performing the database operations over several servers. Therefore, it may be salient for users to switch to client-server SQL systems, such as PostgreSQL.
- Large-volume databases
SQLite notes that a single database file is limited to 281 terabytes (and also notes that some operating systems place the limit below that). Databases that must rely on more data may have to find alternatives to SQLite.
- Concurrent Users
SQLite only allows one connection to the database at a time, placing any incoming connections into a waiting queue. Smaller applications do not necessarily need to worry about concurrency since the wait time is usually less than 100 milliseconds, but applications with many, many users may need the performance boost offered by supporting more concurrency.
How do you write a prepared statement in SQLite?
Prepared statements in SQLite are usually made with another programming language such as Python.
First, a programmer can define a statement, such as this:
SELECT * FROM users WHERE id = ?;
The question mark serves as a placeholder for any incoming values (other placeholders do exist, such as ?NNN, where "NNN" is a number between 1 and the maximum number of variables SQLite can support). After this, a programmer can "pass in" parameters into the statement, usually in the form of some kind of collection (i.e. a list of parameters in Python).
What kind of indexing does SQLite use?
SQLite uses b-trees for indexing rows in a database table.
What are partial indexes in SQLite?
Partial indexes are indexes that point to a subset in the data, usually defined by a certain condition (i.e. a partial index may point to all financial bank accounts in a table accounts that have at least $1000). Creating such partial indexes can help in optimizing read operations, especially when adding the WHERE clause to a SELECT statement.
How do you create a partial index in SQLite?
Here is an example of a partial index in SQLite, for a table people with columns name (TEXT) and age (INTEGER):
CREATE INDEX age_index ON people(age) WHERE age > 18;
As can be seen, a basic partial index requires a name ("age_index"), the table ("people"), a column to index ("age"), and a condition ("WHERE age > 18").
What are constraints in SQLite? What are some examples?
Constraints are defined rules that determine what kind of data can be inserted into a database table. Some examples of constraints in SQLite include:
- UNIQUE -- a constraint that specifies a certain attribute as non-duplicable (i.e. a unique username, a unique name, etc)
- NOT NULL -- a constraint that states a certain value must not be null (i.e. make sure people's first names are not null)
- DEFAULT -- a constraint that specifies a default value for a column (i.e. the default amount of money put into a bank account is $0)
- PRIMARY KEY -- a constraint that specifies a table column is a primary key, making it both UNIQUE and NOT NULL.
- CHECK -- a general, user-defined constraint that enforces a certain condition on incoming data (i.e. check that the age of the user is 13 or above)
- FOREIGN KEY -- constraint used to ensure that a row inserted into a table has an existing relationship with another row in another table (explained in next question).
What are foreign keys in SQLite?
Foreign keys are constraints that ensure that any incoming data in a table can only be committed if there exists an existing relationship between the data and another table. Essentially, foreign keys ensure that any related data inserted at the same time have an existing "link," so to say. See next question for example.
How do you define foreign keys in SQLite?
Suppose we wanted to create two tables like so:
CREATE TABLE customers ( customerid INTEGER PRIMARY KEY, customername TEXT NOT NULL ); CREATE TABLE transactions ( transactionid INTEGER PRIMARY KEY, id_customer INTEGER NOT NULL, amount INTEGER NOT NULL, FOREIGN KEY(id_customer) REFERENCES customers(customerid) );
Thus, suppose we wanted to insert a new transaction:
INSERT INTO transactions VALUES (44, 22, 1000);
The data can only be inserted if there already exists a customer with id 22 in the customers table. In other words, the foreign key id_customer and primary key customerid are existentially linked.
Explain functions in SQLite.
Functions in SQLite are lines of SQL code that (in most cases) take in a parameter (or parameters) and return some value. In SQLite, there are two types of functions:
- Deterministic - a function that returns the same value given the same input (i.e. abs(X) will always return the same value if the same input is given)
- Nondeterministic - a function that returns a different answer every time it is called (i.e. random())
What are some examples of functions in SQLite?
Some examples of functions include:
- abs(X) - absolute value of some numerical expression
- random() - returns a pseudo-random integer
- lower(X) - returns a lower-case ASCII-character string
What is PRAGMA in SQLite?
PRAGMA is a SQLite-specific extension that can define or modify SQLite environment properties. There are built-in PRAGMA in SQLite, as well as potential user-defined ones as well.
What are some examples of PRAGMA in SQLite?
Some examples of PRAGMA include:
- encoding - defines the encoding for a given database (i.e. UTF-8)
- index_list - returns all indices associated with a table
What are aggregate functions?
In SQLite, aggregate functions are functions that take in a single parameter representing some subset of rows and perform operations over said rows.
What are some examples of aggregate functions?
Some examples of aggregate functions include:
- avg(X) - taking the average of a group
- count(X) - count the number of times X occurs in a group
- max(X) - the maximum value of all items in a group
How do you load a database in-memory?
To load a database in memory (in this case, Python), perform the following code:
import sqlite3 from sqlite3 import connect from contextlib import closing # create a database connection in memory with connect(":memory:") as conn: conn.row_factory = sqlite3.Row with closing(conn.cursor()) as cursor: # Set up the database stmt = """ CREATE TABLE people (name TEXT NOT NULL, age INTEGER NOT NULL); """ cursor.execute(stmt) stmt = 'INSERT INTO people VALUES ("Ryan", 18);' cursor.execute(stmt) stmt = 'INSERT INTO people VALUES ("Sandra", 14);' cursor.execute(stmt) stmt = 'INSERT INTO people VALUES ("Sally", 19);' cursor.execute(stmt) cursor.execute("SELECT * FROM people WHERE age >= 18;") rows = cursor.fetchall() for row in rows: print(row['name'], row['age'])
What is PostgreSQL?
PostgreSQL is an open-source, client-server database solution that contains many unique features, such as extensibility.
Give a brief overview of the history of PostgreSQL.
PostgreSQL was first started as the POSTGRES project in 1986 at the University of California, Berkeley. It was first spearheaded by Professor Michael Stonebraker, and since then POSTGRES evolved into PostgreSQL.
What features does PostgreSQL have?
PostgreSQL boasts powerful features, including (but certainly not limited to):
- Support for complex data types, including JSON and XML
- Extensibility (i.e. defining user-defined types and operators)
- Concurrency performance through MVCC and various types of indexes
To what extent does PostgreSQL conform to the SQL standard?
As of the SQL:2016 standard, PostgreSQL notes that it supports 170 out of 177 mandatory SQL features.
What are some data types PostgreSQL supports?
PostgreSQL supports many common SQL data types, such as INTEGER, TEXT, and BOOLEAN. However, PostgreSQL also provides support for other kinds of complex data types, including:
In addition, PostgreSQL gives users the ability to define their own data types.
What are some operators offered by PostgreSQL?
Some operators offered by PostgreSQL include:
- Mathematical operators (+, -, /, *)
- Comparison operators (<, <=, =, >=, >)
- Complex operators (@, ~, ||, |/)
What is the maximum number of databases for PostgreSQL?
The maximum number of databases in PostgreSQL is 4,294,950,911.
How do you create a comment in PostgreSQL?
Single line comments can be created like so:
-- this is a single lined comment
Mutli-line comments can be created like so:
/* * comment starts on this line * and resumes on this line */
What is Multiversion Concurrency Control (MVCC)?
Multiversion Concurrency Control is PostgreSQL's solution to handling concurrent users. In MVCC, any database operation taking place works with a previous snapshot, or version, of the database. By doing so, MVCC provides what is known as transaction isolation for each operation.
Explain the advantage(s) of Multiversion Concurrency Control.
PostgreSQL's MVCC ensures that locks that affect querying data do not affect locks that affect writing data into the database. This means that the entire database does not need to be locked, as in other database systems.
What is table inheritance in PostgreSQL?
PostgreSQL offers a feature called table inheritance. In table inheritance, a user can create a table such that it inherits the properties of another table (i.e. columns and their types, constraints, etc).
How do you make a child table from a parent table (how does a table inherit properties)?
Suppose we wanted to create two tables: people and employees. We want both tables to have
properties such as id (UUID), name (TEXT), phone (TEXT), and email (TEXT). The employees table
will have an extra property called salary (INTEGER).
In PostgreSQL, we can do the following:
CREATE TABLE people ( id UUID PRIMARY KEY NOT NULL, name TEXT NOT NULL, phone TEXT NOT NULL, email TEXT NOT NULL ); CREATE TABLE employees ( salary INTEGER NOT NULL ) INHERITS (people);
What kinds of indexing does PostgreSQL support?
PostgreSQL offers the following index types:
- B-tree - the default indexing in PostgreSQL database tables
In addition, users can also define their own indices, though for most users the above indexes are sufficient for indexing data.
How do you create an index in PostgreSQL?
Suppose we had a table called cars, with a column called name that we would like to index.
To create a basic index (by default, b-tree), one can write:
CREATE INDEX name_index ON cars (name);
What is a partial index in PostgreSQL?
Like in SQLite, a partial index in PostgreSQL points to a subset of data in a table, usually defined by some condition (i.e. a partial index may point to all financial transactions in a table sold that involve at least $1000).
How do you define partial indexes in PostgreSQL?
Suppose we wanted to create a table of financial transactions, each with an integer id and an integer amount (of money). Suppose we also wanted to create a partial index covering transactions at least $1000. Then we do the following:
CREATE TABLE transactions ( id INTEGER PRIMARY KEY, amount INTEGER NOT NULL, ); CREATE INDEX transactions_index ON transactions (amount) WHERE (amount >= 1000);
What is partitioning in PostgreSQL?
Partitioning is the practice of taking a very large table and dividing (aka partitioning) it into smaller pieces, which are then stored physically on the disk. PostgreSQL offers three types of partitioning:
- Range Partitioning - a partition based on numerical (or some other continuous) range imposed on a column
- List Partitioning - a partition based on what keys are directly chosen for a given partition
- Hash Partitioning - a partition based on applying a hash function on each key-value pair
How do you partition a table in PostgreSQL?
Suppose we had a large table containing financial transactions, with two columns: transaction id (UUID) and transaction amount (INTEGER). Suppose we also knew (for the sake of example) that the maximum possible transaction value is $1000.
To create a partition by hundreds (i.e. $0-$100, $100-$200, ..., $900-$1000), we could write the following syntax:
-- creates a partitioned table CREATE TABLE transactions ( transaction_id UUID PRIMARY KEY NOT NULL, transaction_amount INTEGER NOT NULL ) PARTITION BY RANGE (transaction_amount); -- makes the actual partitions CREATE TABLE transactions_0100 PARTITION OF transactions FOR VALUES FROM (0) TO (100); ... CREATE TABLE transactions_9901000 PARTITION OF transactions FOR VALUES FROM (900) TO (1000);
When should you use partitioning vs. partial indexes in PostgreSQL?
While it is possible to create indexes that behave like partitions (i.e. specify partial indexes that cover different ranges of data), doing so will result in performance slowdowns. According to the documentation, PostgreSQL needs to validate each index. Of course, the most benefit for partinioning comes when dealing with extremely large tables.
What are constraints in PostgreSQL? What are some examples?
Constraints in PostgreSQL define strict guidelines or limitations for any incoming data. Similar to SQLite, some examples include:
- Primary key
- Not null
- Foreign Keys (discussed in the next two questions)
What are foreign keys in PostgreSQL?
In PostgreSQL, foreign keys serve to enforce relationships between primary keys in different tables. Specifically, any operations that deal with inserting or updating data into a table should not occur without an existing, matching record in another table.
How do you define foreign keys in PostgreSQL?
Suppose we wanted to define two tables: accounts (with columns id INTEGER, name TEXT, and type TEXT) and balances (id INTEGER, amount INTEGER). To create a foreign key for a column, we must call upon the REFERENCES keyword:
-- bank accounts CREATE TABLE accounts ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, type TEXT NOT NULL ); -- bank balances associated with accounts, -- linked by foreign key id CREATE TABLE balances ( id INTEGER REFERENCES accounts(id), -- creates a foreign key with relation to id amount INTEGER NOT NULL );
When should PostgreSQL be used?
Due to MVCC, PostgreSQL is ideal for dealing with concurrent connections, as unnecessary locking does not take place. This significantly optimizes read and write operations for concurrent connections compared to, for instance, SQLite.
Sharding a database (or splitting a database into multiple servers) is a common technique that helps in reducing the load of read/write requests on database servers. Companies such as Instagram use PostgreSQL for such kinds of cases.
Some contexts (i.e. scientific or mathematical research) may require more esoteric data types not commonly supported by other database solutions (i.e. complex numbers, etc). In such cases, the ability to make user-defined types may prove to be useful.
When shouldn't PostgreSQL be used?
- Embedded Applications
While it is possible to write a REST API or a client-server program to retrieve data from a database server, application developers may find it simpler to use a lightweight solution for storing data. In such a case, SQLite may prove to be a much simpler solution, especially since it is a very compact library.
- Configuration and Installation Difficulties
PostgreSQL requires setup with servers, bringing with it costs related to maintaining database servers. If this cost proves to be too much, then a different system may be preferable.
What is write-ahead logging (WAL)?
Write-ahead logging is a common method of keeping data uncorrupted. In WAL, any changes to tables or rows can only take place after those changes are logged. This means that, in the event of an uncommitted change, the change in question can be reapplied by querying the logs.
How do you write a prepared statement in PostgreSQL?
According to the documentation, writing a prepared statement usually takes on the following format:
PREPARE name [ ( data_type [, ...] ) ] AS statement
For instance, if one may want to perform a search on a users table with the following prepared statement:
PREPARE find_user(INTEGER) AS SELECT * FROM users WHERE id = $1;
Such a prepared statement would (in the same database connection) be executed by calling:
Explain functions in PostgreSQL.
In PostgreSQL, functions are callable objects that execute some kind of code, given some data (aka parameters). PostgreSQL features many built-in functions, such as ln() (the ln() function in math), and allows users to define their own functions with Procedural Languages.
Explain Procedural Languages (PL) in PostgreSQL.
PostgreSQL allows users to define functions using procedural languages, which are executable languages other than SQL or C (which PostgreSQL is built on).
What Procedural Languages (PLs) does PostgreSQL support?
How can a user define a custom function with PL/pgSQL? Give an example.
Suppose we wanted to create a function that calculated a 20% off discount on a given product's price. We could define the function as follows:
CREATE FUNCTION discount(price real) RETURNS real AS $$ BEGIN RETURN price * 0.8; END; $$ LANGUAGE plpgsql;
What are triggers in PostgreSQL?
How can a user define a custom data type?
Suppose we wanted to define a type called TRANSACTION. We can specify the structure as follows:
CREATE TYPE transaction AS ( id INTEGER, amount INTEGER );
How can a user define a custom operator?
Suppose, using the previous example, we wanted to add two transactions.
We would need to:
- Define the TRANSACTION type
- Define a function that adds two objects of type TRANSACTION
- Create the operator that represents the addition of two TRANSACTION objects
In the following code, we perform the above three steps in order, as well as create a table of transactions and perform our operator.
-- create a transaction data type CREATE TYPE transaction AS ( id INTEGER, amount INTEGER ); -- create a function that adds the amount of two transactions CREATE FUNCTION transaction_add(t1 TRANSACTION, t2 TRANSACTION) RETURNS INTEGER AS $sum$ BEGIN RETURN (t1).amount + (t2).amount; END $sum$ LANGUAGE plpgsql; -- create an addition operator for the transaction type CREATE OPERATOR + ( leftarg = TRANSACTION, rightarg = TRANSACTION, function = transaction_add, commutator = + ); -- create a table with columns id (INTEGER), -- t1 (TRANSACTION), and t2 (TRANSACTION) CREATE TABLE transactions ( id INTEGER PRIMARY KEY NOT NULL, t1 TRANSACTION NOT NULL, t2 TRANSACTION NOT NULL ); -- insert two rows, each containing a unique id and two unique transactions INSERT INTO transactions VALUES (1, (4, 4000), (5, 5000)); INSERT INTO transactions VALUES (2, (2, 2000), (10, 2000)); SELECT t1 + t2 AS sum FROM transactions;
Explain type conversion in PostgreSQL.
Type conversion occurs in PostgreSQL when an object of a certain data type is automatically converted to match an operation with another object of a different data type (i.e. converting INTEGER to DOUBLE). In PostgreSQL, type conversions usually take place with function calls, operators, INSERT and UPDATE operations, and other operations that may involve two or more data types.
Explain full-text search in PostgreSQL.
In PostgreSQL, full-text search is a process wherein a database finds all documents (i.e. texts, etc) that contain a certain query term or token. Usually, full-text search occurs within tables that contain columns of textual data. PostgreSQL provides the ability for users to define queries with certain operators.
What are some tools that can be used to monitor databases?
PostgreSQL offers many different types of monitoring functions:
- Statistical functions - provides statistics about database operations
- pg_locks - a system table that allows a user to view the locks in a database
- Progress analysis - allows users to monitor the current progress of certain database commands (i.e. CREATE INDEX)
Explain parallel querying in PostgreSQL.
In parallel querying, PostgreSQL can execute certain queries using multiple CPU cores, rather than one at a time (hence parallel). Not all queries benefit from parallel querying, but the queries that do often report a 2-3 times boost in speed.