SQL Queries Asked in Data Engineer Interviews
2025-08-22
Data engineer interviews often dive deep into SQL because it tests how you think about data, not just how you code. The same patterns appear across companies — aggregations, joins, window functions, and edge case handling. Below are the most common SQL queries you’ll see, with real examples you can adapt quickly.
1) Find the Second Highest Salary
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Pattern: subquery to exclude the max.
Variant: use DENSE_RANK()
to handle ties.
2) Nth Highest Salary
SELECT salary
FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = 3;
Pattern: window functions are safer than LIMIT … OFFSET
when duplicates exist.
3) Employees Earning More Than Their Manager
SELECT e.name AS employee
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
Pattern: self-join for hierarchical data.
4) Department With Highest Average Salary
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC
LIMIT 1;
Pattern: aggregation + ordering.
5) Customers Who Never Ordered
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.customer_id IS NULL;
Pattern: left join anti join (WHERE … IS NULL
).
6) Running Total of Sales
SELECT order_date,
SUM(amount) OVER (ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;
Pattern: cumulative aggregates with window functions.
7) Top N Per Group (e.g., Top 3 Products Per Category)
SELECT category, product, revenue
FROM (
SELECT category, product, revenue,
DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_sales
) t
WHERE rnk <= 3;
Pattern: partition + rank filter.
8) Consecutive Days (Attendance or Activity Streaks)
SELECT user_id,
MIN(activity_date) AS start_date,
MAX(activity_date) AS end_date,
COUNT(*) AS streak_length
FROM (
SELECT user_id,
activity_date,
activity_date - INTERVAL ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY activity_date
) DAY AS grp
FROM logins
) t
GROUP BY user_id, grp;
Pattern: subtract row number from date to group streaks.
9) Percentage of Orders Shipped Late
SELECT ROUND(
100.0 * SUM(CASE WHEN shipped_date > required_date THEN 1 ELSE 0 END) / COUNT(*),
2
) AS pct_late
FROM orders;
Pattern: conditional aggregation.
10) Duplicate Records
SELECT name, email, COUNT(*) AS cnt
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1;
Pattern: HAVING
filters groups, not rows.
11) Find the Median Salary
SELECT AVG(salary) AS median
FROM (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER() AS total
FROM employees
) t
WHERE rn IN (FLOOR((total + 1) / 2.0), CEIL((total + 1) / 2.0));
Pattern: rank salaries, then take middle rows.
12) Users With Highest Activity in a Given Time Window
SELECT user_id, COUNT(*) AS actions
FROM logs
WHERE event_time BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY user_id
ORDER BY actions DESC
LIMIT 5;
Pattern: filter first, then aggregate.
13) Cumulative Distinct Counts (Unique Users per Day Running Total)
SELECT day,
COUNT(DISTINCT user_id) OVER (ORDER BY day ROWS UNBOUNDED PRECEDING) AS cumulative_users
FROM daily_users;
Pattern: distinct + cumulative window.
14) Join Multiple Tables and Filter
SELECT o.order_id, c.name, p.product_name, o.amount
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN products p ON p.product_id = o.product_id
WHERE o.amount > 100;
Pattern: joining multiple entities with filtering logic.
15) Last Order Per Customer
SELECT customer_id, order_id, order_date
FROM (
SELECT customer_id, order_id, order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
) t
WHERE rn = 1;
Pattern: row_number + partition.
Quick Checklist for Data Engineer SQL Interviews
- Aggregations: SUM, COUNT, AVG, MIN, MAX with
GROUP BY
- Joins: inner, left, right, full, self joins
- Window functions: ROW_NUMBER, RANK, LAG/LEAD, SUM OVER
- Subqueries & CTEs: for modularity and clarity
- Date handling:
DATE_TRUNC
,INTERVAL
,TIMESTAMPDIFF
- Edge cases: duplicates, nulls, empty groups
Final Note
Almost every real interview combines these patterns: filtering, joining, windowing, and aggregation. Practice writing them fluently and explaining why your query works — that’s what separates a good candidate from a great one.
If you want instant outlines and explanations of SQL patterns while you practice, check out StealthCoder. It can overlay solutions in real time so you can focus on learning the reasoning.