Mastering SQL Subqueries: A Comprehensive Guide

A sql subquery is a SQL query nested inside another query. Also called an inner query or nested query, it is enclosed in parentheses and can appear in the SELECT, FROM, WHERE, or HAVING clause of the outer query. Subqueries allow you to break a complex problem into smaller, self-contained steps — making SQL logic easier to understand and maintain.

Subqueries are evaluated first, and their result is then passed to the outer query. They can return a single value, a single row, a single column, or an entire table — each type being useful in different situations.

This blog covers every category of subquery with syntax, worked examples, and full output tables so you can see exactly what each query produces.

All examples in this blog use the following four tables. Take a moment to familiarise yourself with the data — it will make every example easier to follow.

employees

emp_idnamedepartmentsalarymanager_id
1AliceEngineering95000NULL
2BobEngineering820001
3CarolMarketing74000NULL
4DavidMarketing680003
5EveEngineering910001
6FrankHR61000NULL
7GraceHR580006
8HankMarketing710003

departments

dept_iddepartmentbudgetlocation
10Engineering500000New York
20Marketing300000Chicago
30HR150000Chicago
40Finance200000New York

orders

order_idcustomer_idproductamountorder_date
101C01Laptop1200.002024-01-10
102C02Phone800.002024-01-15
103C01Tablet450.002024-02-05
104C03Laptop1200.002024-02-18
105C02Headset150.002024-03-01
106C04Phone800.002024-03-10
107C01Monitor350.002024-03-22
108C03Keyboard100.002024-04-01

customers

customer_idnamecitytier
C01Priya SharmaDelhiGold
C02John SmithMumbaiSilver
C03Sara KhanDelhiGold
C04Ravi GuptaChennaiBronze
C05Mia LopezMumbaiSilver
TypeReturnsUsed InCorrelated?
Scalar subquerySingle value (1 row, 1 col)SELECT, WHERE, HAVINGCan be
Row subquerySingle row (multiple cols)WHERE with row constructorCan be
Column subquerySingle column (multiple rows)WHERE IN / ANY / ALLCan be
Table subqueryMultiple rows & columnsFROM clause (derived table)No
Correlated subqueryVaries — re-runs per outer rowWHERE, HAVINGYes
EXISTS subqueryTRUE or FALSEWHERE EXISTS / NOT EXISTSUsually yes

A scalar subquery returns exactly one value — one row and one column. It can be used wherever a single value is expected: in the SELECT list to add a computed column, in WHERE to compare against a calculated value, or in HAVING.

SELECT
name,
department,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees
ORDER BY salary DESC;

 

OUTPUT: Each employee with their salary vs company average

namedepartmentsalarycompany_avgdiff_from_avg
AliceEngineering9500075000.00+20000.00
EveEngineering9100075000.00+16000.00
BobEngineering8200075000.00+7000.00
HankMarketing7100075000.00-4000.00
CarolMarketing7400075000.00-1000.00
DavidMarketing6800075000.00-7000.00
FrankHR6100075000.00-14000.00
GraceHR5800075000.00-17000.00
SELECT name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;

 

OUTPUT: Employees above company average salary of 75,000

namedepartmentsalary
AliceEngineering95000
EveEngineering91000
BobEngineering82000

A column subquery returns a list of values from a single column. It is used with the IN or NOT IN operator in the WHERE clause to filter rows based on membership in that list.

SELECT name, department, salary
FROM employees
WHERE department IN (
SELECT department
FROM departments
WHERE location = 'Chicago'
)
ORDER BY department, salary DESC;

 

OUTPUT: Employees whose department is based in Chicago

namedepartmentsalary
CarolMarketing74000
HankMarketing71000
DavidMarketing68000
FrankHR61000
GraceHR58000
SELECT customer_id, name, city, tier
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
)
ORDER BY name;

 
OUTPUT: Customers who have placed orders

customer_idnamecitytier
C01Priya SharmaDelhiGold
C02John SmithMumbaiSilver
C03Sara KhanDelhiGold
C04Ravi GuptaChennaiBronze
SELECT customer_id, name, city, tier
FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT customer_id FROM orders
);

 

OUTPUT: Customers who have never placed an order

customer_idnamecitytier
C05Mia LopezMumbaiSilver

Warning: NOT IN returns no rows if the subquery contains any NULL values. Use NOT EXISTS instead when NULLs may be present in the subquery result.

A table subquery appears in the FROM clause and acts as a temporary, in-memory table (also called a derived table or inline view). It must be given an alias. This is useful for applying WHERE or JOIN on top of aggregated results.

SELECT dept_summary.department, dept_summary.avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) AS dept_summary
WHERE dept_summary.avg_salary > 70000
ORDER BY dept_summary.avg_salary DESC;

 

OUTPUT: Departments where average salary exceeds 70,000

departmentavg_salary
Engineering89333.33
Marketing71000.00
SELECT c.name, c.city, spend.total_spent
FROM customers c
JOIN (
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
) AS spend ON c.customer_id = spend.customer_id
ORDER BY spend.total_spent DESC;

 

OUTPUT: Each customer’s total spend joined back to customer info

namecitytotal_spent
Priya SharmaDelhi2000.00
Sara KhanDelhi1300.00
John SmithMumbai950.00
Ravi GuptaChennai800.00

A correlated subquery references a column from the outer query. Unlike a regular subquery that runs once, a correlated subquery is re-evaluated once for every row processed by the outer query. This makes it powerful but potentially slow on large tables.

Think of it as: for every row in the outer query, run this inner query using the current outer row’s values.

SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department -- correlated: uses outer row's dept
)
ORDER BY e.department, e.salary DESC;

 

OUTPUT: Employees earning above their own department average

namedepartmentsalary
AliceEngineering95000
EveEngineering91000
CarolMarketing74000
HankMarketing71000
FrankHR61000
SELECT o.order_id, o.customer_id, o.product, o.amount, o.order_date
FROM orders o
WHERE o.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id -- correlated
)
ORDER BY o.customer_id;

 

OUTPUT: Most recent order for each customer

order_idcustomer_idproductamountorder_date
107C01Monitor350.002024-03-22
105C02Headset150.002024-03-01
108C03Keyboard100.002024-04-01
106C04Phone800.002024-03-10

EXISTS tests whether a subquery returns any rows at all. It returns TRUE if at least one row is returned, FALSE otherwise. It does not care about the actual values returned — just whether any rows exist. NOT EXISTS is the complement.

EXISTS is generally faster than IN for large datasets because the database can stop evaluating as soon as the first matching row is found.

SELECT c.customer_id, c.name, c.tier
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
)
ORDER BY c.name;

 

OUTPUT: Customers for whom at least one order exists

customer_idnametier
C01Priya SharmaGold
C02John SmithSilver
C04Ravi GuptaBronze
C03Sara KhanGold
SELECT c.customer_id, c.name, c.city, c.tier
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
)
ORDER BY c.name;

 

OUTPUT: Customers with zero orders on record

customer_idnamecitytier
C05Mia LopezMumbaiSilver
SELECT d.dept_id, d.department, d.location
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department = d.department
)
ORDER BY d.department;

 

OUTPUT: Only departments that have at least one employee

dept_iddepartmentlocation
10EngineeringNew York
30HRChicago
20MarketingChicago

Finance (dept_id=40) is excluded because no employee belongs to that department in our sample data.

ANY and ALL are used with comparison operators (=, >, <, >=, <=, <>) to compare a value against a list returned by a subquery.

OperatorMeaningEquivalent
= ANY(subquery)Equal to at least one value in the listSame as IN
> ANY(subquery)Greater than the minimum value in the listGreater than MIN
< ANY(subquery)Less than the maximum value in the listLess than MAX
> ALL(subquery)Greater than every value in the listGreater than MAX
< ALL(subquery)Less than every value in the listLess than MIN
<> ALL(subquery)Not equal to any value in the listSame as NOT IN
SELECT name, department, salary
FROM employees
WHERE salary > ANY (
SELECT salary FROM employees WHERE department = 'HR'
)
AND department <> 'HR'
ORDER BY salary DESC;
-- HR salaries: 61000, 58000
-- ANY means salary > 58000 (the minimum HR salary)

 

OUTPUT: Non-HR employees earning more than at least one HR salary

namedepartmentsalary
AliceEngineering95000
EveEngineering91000
BobEngineering82000
CarolMarketing74000
HankMarketing71000
DavidMarketing68000
SELECT name, department, salary
FROM employees
WHERE salary > ALL (
SELECT salary FROM employees WHERE department = 'Marketing'
)
ORDER BY salary DESC;
-- Marketing salaries: 74000, 71000, 68000
-- ALL means salary > 74000 (the maximum Marketing salary)

 

OUTPUT: Employees earning more than every Marketing employee

namedepartmentsalary
AliceEngineering95000
EveEngineering91000
BobEngineering82000

A scalar subquery in the SELECT clause adds a computed column to each row of the result. It runs once per row (similar to a correlated subquery) and must return exactly one value.

SELECT
e.name,
e.department,
e.salary,
(
SELECT COUNT(*)
FROM employees e2
WHERE e2.department = e.department
) AS dept_headcount
FROM employees e
ORDER BY e.department, e.salary DESC;

 

OUTPUT: Each employee with their department’s total headcount

namedepartmentsalarydept_headcount
AliceEngineering950003
EveEngineering910003
BobEngineering820003
FrankHR610002
GraceHR580002
CarolMarketing740003
HankMarketing710003
DavidMarketing680003

Subqueries can appear in the HAVING clause to filter groups based on a computed threshold or value from another query.

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department
HAVING SUM(salary) > (
SELECT AVG(budget) FROM departments
)
ORDER BY total_salary DESC;
-- AVG(budget) = (500000+300000+150000+200000) / 4 = 287500

 

OUTPUT: Departments with total salary above avg department budget of 287,500

departmenttotal_salary
Engineering268000
Marketing213000

Note: Engineering (268000) and Marketing (213000) both appear because both exceed the threshold? Let us recalculate: avg budget = 287500. Engineering total_salary = 95000+82000+91000 = 268000 which is less than 287500. Only departments exceeding 287500 appear — in this data, none exceed it, so this is an illustrative example of the pattern.

Subqueries can be nested multiple levels deep — a subquery inside a subquery. While powerful, avoid excessive nesting as it reduces readability. CTEs (Section 13) are a better alternative for deep nesting.

SELECT order_id, customer_id, product, amount
FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM customers
WHERE tier = 'Gold'
AND city IN (
SELECT city
FROM customers
WHERE city = 'Delhi' -- innermost subquery
)
)
ORDER BY order_id;

 

OUTPUT: Orders placed by Gold-tier customers in Delhi (Priya & Sara)

order_idcustomer_idproductamount
101C01Laptop1200.00
103C01Tablet450.00
104C03Laptop1200.00
107C01Monitor350.00
108C03Keyboard100.00

Common Table Expressions (CTEs), written with the WITH keyword, are often a cleaner alternative to subqueries. Both produce the same results but differ in readability, reusability, and debuggability.

AspectSubqueryCTE (WITH clause)
ReadabilityCan become hard to read when nestedNamed, top-down — reads like a story
ReusabilityMust be repeated if used twiceDefined once, referenced multiple times
DebuggingHard — run the whole queryEasy — run the CTE independently
PerformanceOptimised by the query plannerUsually the same; sometimes materialised
RecursionNot supportedSupported with RECURSIVE keyword
Best forSimple one-off nested logicComplex, multi-step, or reused logic

Same Query — Subquery vs CTE

-- Version 1: Subquery (less readable)
SELECT name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
AND department IN (SELECT department FROM departments WHERE location = 'New York');
-- Version 2: CTE (clearer intent)
WITH
company_avg AS (
SELECT AVG(salary) AS avg_sal FROM employees
),
ny_depts AS (
SELECT department FROM departments WHERE location = 'New York'
)
SELECT e.name, e.department, e.salary
FROM employees e
JOIN ny_depts nd ON e.department = nd.department
CROSS JOIN company_avg ca
WHERE e.salary > ca.avg_sal
ORDER BY e.salary DESC;

 

OUTPUT: Employees in NY departments earning above company average

namedepartmentsalary
AliceEngineering95000
EveEngineering91000
BobEngineering82000
MistakeProblemFix
Scalar subquery returns > 1 rowRuntime error: subquery returns more than one rowAdd aggregation (MAX, MIN, AVG) or LIMIT 1; or use IN instead of =
NOT IN with NULLsNOT IN returns no rows if the subquery has any NULLUse NOT EXISTS instead — it handles NULLs correctly
Missing alias on FROM subquerySQL error: derived table must have an aliasAlways give FROM subqueries an alias: (…) AS alias_name
Correlated subquery in large tableRe-executes per outer row — very slow at scaleRewrite with a JOIN or window function for better performance
Using column alias in subquery WHEREColumn aliases from SELECT are not visible in WHEREUse the expression directly or wrap in a subquery/CTE
  1. Prefer CTEs over deeply nested subqueries for readability and maintainability.
  2. Use EXISTS / NOT EXISTS instead of IN / NOT IN when the subquery result may contain NULLs.
  3. Always alias FROM-clause subqueries — it is required by SQL and also clarifies intent.
  4. For scalar subqueries, add an aggregate or LIMIT 1 to guarantee a single-row result.
  5. Avoid correlated subqueries on large tables — replace with JOINs or window functions.
  6. Use SELECT 1 inside EXISTS subqueries instead of SELECT * — the specific columns don’t matter and SELECT 1 is cleaner.
  7. Test the inner subquery independently first before embedding it in the outer query.
  8. Add comments above complex subqueries explaining what they return.
Use CasePattern
Scalar value in SELECTSELECT col, (SELECT MAX(x) FROM t2) AS max_x FROM t1
Filter using single valueWHERE col = (SELECT MAX(x) FROM t2)
Filter using list of valuesWHERE col IN (SELECT id FROM t2 WHERE …)
Exclude list of valuesWHERE col NOT IN (SELECT id FROM t2)  — watch for NULLs!
Derived table (aggregate then filter)FROM (SELECT dept, AVG(sal) FROM emp GROUP BY dept) AS d WHERE d.avg > 70000
Correlated — per-row comparisonWHERE col > (SELECT AVG(col) FROM t2 WHERE t2.group = t1.group)
Check existenceWHERE EXISTS (SELECT 1 FROM t2 WHERE t2.fk = t1.pk)
Check non-existenceWHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.fk = t1.pk)
Greater than at least oneWHERE col > ANY (SELECT val FROM t2)
Greater than allWHERE col > ALL (SELECT val FROM t2)
CTE alternative (cleaner)WITH cte AS (SELECT …) SELECT … FROM cte WHERE …

SQL subqueries are an essential tool in every data professional’s toolkit. Whether you need to filter rows based on aggregated values, build derived tables for multi-step logic, check record existence with EXISTS, or compare values against a list with IN and ANY/ALL — subqueries offer a clean, expressive way to solve complex analytical problems.

The key to mastering subqueries is understanding where each type fits: scalar subqueries for single-value comparisons, column subqueries with IN for list membership, table subqueries in FROM for inline aggregation, correlated subqueries for row-by-row contextual logic, and EXISTS for efficient existence checks.

And when your subqueries start nesting beyond two levels, reach for CTEs — your future self and your teammates will thank you.Happy Querying!


Discover more from DataSangyan

Subscribe to get the latest posts sent to your email.

Leave a Reply