Now that you've mastered the basics of SQL, it's time to level up your database skills. Advanced SQL techniques allow you to handle complex business logic, optimize performance, and extract deeper insights from your data.
In this guide, we'll explore powerful SQL features that will transform you from a beginner into a proficient database developer.
๐ฏ HAVING: Filtering grouped data
While WHERE
filters individual rows, HAVING
filters groups created by GROUP BY
. Think of HAVING
as the "WHERE clause for aggregated data."
Basic HAVING example:
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 5;
This shows only categories with more than 5 products.
HAVING Vs WHERE:
-- โ This won't work - can't use aggregates in WHERE
SELECT category, COUNT(*) as product_count
FROM products
WHERE COUNT(*) > 5 -- ERROR!
GROUP BY category;
-- โ
Correct approach with HAVING
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 5;
Complex HAVING example:
SELECT
user_id,
COUNT(*) as order_count,
SUM(total_amount) as total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY user_id
HAVING COUNT(*) >= 3 AND SUM(total_amount) > 1000;
This finds customers who placed at least 3 orders and spent more than $1000 this year.
๐ Advanced JOIN techniques
LEFT JOIN With NULL checks:
-- Find users who haven't placed any orders
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.user_id IS NULL;
Self JOIN:
-- Find employees and their managers
SELECT
e.name as employee_name,
m.name as manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Multiple JOINs:
SELECT
u.name,
o.order_date,
p.product_name,
oi.quantity
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2024-01-01';
๐ช Window functions: advanced analytics
Window functions perform calculations across related rows without collapsing them into groups.
ROW_NUMBER() - Ranking data:
SELECT
name,
salary,
department,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_dept
FROM employees;
RANK() And DENSE_RANK():
SELECT
product_name,
sales_amount,
RANK() OVER (ORDER BY sales_amount DESC) as rank,
DENSE_RANK() OVER (ORDER BY sales_amount DESC) as dense_rank
FROM product_sales;
Running totals with SUM():
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) as running_total
FROM daily_sales
ORDER BY order_date;
LAG() And LEAD() - comparing with previous/next rows:
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as previous_month,
revenue - LAG(revenue) OVER (ORDER BY month) as growth
FROM monthly_revenue;
๐ Common table expressions (CTES)
CTEs make complex queries more readable by breaking them into logical parts.
Basic CTE:
WITH high_value_customers AS (
SELECT user_id, SUM(total_amount) as total_spent
FROM orders
GROUP BY user_id
HAVING SUM(total_amount) > 1000
)
SELECT u.name, hvc.total_spent
FROM high_value_customers hvc
JOIN users u ON hvc.user_id = u.id;
Recursive CTE - organizational hierarchy:
WITH RECURSIVE employee_hierarchy AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;
๐จ Advanced conditional logic
CASE Statements:
SELECT
name,
age,
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 65 THEN 'Adult'
ELSE 'Senior'
END as age_category,
CASE
WHEN total_orders > 50 THEN 'VIP'
WHEN total_orders > 20 THEN 'Regular'
ELSE 'New'
END as customer_tier
FROM user_summary;
Conditional aggregation:
SELECT
category,
COUNT(*) as total_products,
COUNT(CASE WHEN in_stock = true THEN 1 END) as in_stock_count,
COUNT(CASE WHEN price > 100 THEN 1 END) as premium_count,
AVG(CASE WHEN in_stock = true THEN price END) as avg_in_stock_price
FROM products
GROUP BY category;
๐ Subqueries: queries within queries
Correlated subqueries:
-- Find products that are above average price in their category
SELECT product_name, price, category
FROM products p1
WHERE price > (
SELECT AVG(price)
FROM products p2
WHERE p2.category = p1.category
);
EXISTS Vs IN:
-- Using EXISTS (often more efficient)
SELECT name FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.order_date >= '2024-01-01'
);
-- Using IN
SELECT name FROM users
WHERE id IN (
SELECT DISTINCT user_id FROM orders
WHERE order_date >= '2024-01-01'
);
๐ง Advanced data manipulation
UPSERT (INSERT ... ON CONFLICT):
-- PostgreSQL syntax
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (123, 1, NOW())
ON CONFLICT (user_id)
DO UPDATE SET
login_count = user_stats.login_count + 1,
last_login = NOW();
Bulk updates with JOINs:
UPDATE products
SET discount_percentage = 15
FROM (
SELECT category
FROM category_performance
WHERE sales_last_month < 1000
) low_performing
WHERE products.category = low_performing.category;
DELETE With JOINs:
-- Delete orders from inactive users
DELETE o FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.active = false AND u.last_login < '2023-01-01';
๐ Performance optimization techniques
Using EXPLAIN to analyze queries:
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;
Index optimization:
-- Create composite index for common query patterns
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);
-- Partial index for specific conditions
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
Query hints and optimization:
-- Force index usage (MySQL)
SELECT * FROM users USE INDEX (idx_email) WHERE email = 'user@example.com';
-- Optimize with specific JOIN order
SELECT /*+ USE_NL(u o) */ u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id;
๐งฎ Advanced aggregation patterns
ROLLUP For subtotals:
SELECT
COALESCE(category, 'TOTAL') as category,
COALESCE(brand, 'ALL BRANDS') as brand,
SUM(sales) as total_sales
FROM product_sales
GROUP BY ROLLUP(category, brand)
ORDER BY category, brand;
Pivot tables with CASE:
SELECT
product_id,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN amount ELSE 0 END) as jan_sales,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN amount ELSE 0 END) as feb_sales,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN amount ELSE 0 END) as mar_sales
FROM sales
GROUP BY product_id;
๐ Advanced security practices
Row-level security:
-- Create policy to restrict data access
CREATE POLICY user_data_policy ON orders
FOR ALL TO application_role
USING (user_id = current_setting('app.current_user_id')::INTEGER);
Parameterized queries (application level):
// Node.js example with prepared statements
const query = `
SELECT * FROM users
WHERE email = $1 AND created_at >= $2
`;
const result = await client.query(query, [email, startDate]);
โ Advanced best practices checklist
โ๏ธ Use HAVING for filtering grouped data, WHERE for individual rows
โ๏ธ Leverage window functions instead of self-joins when possible
โ๏ธ Break complex queries into CTEs for better readability
โ๏ธ Use EXISTS instead of IN for better performance with large datasets
โ๏ธ Create appropriate indexes for your query patterns
โ๏ธ Always use EXPLAIN to understand query execution plans
โ๏ธ Implement proper error handling and transaction management
โ๏ธ Use parameterized queries to prevent SQL injection
๐ง Conclusion
Advanced SQL techniques unlock the full power of relational databases. By mastering window functions, CTEs, complex joins, and optimization strategies, you can handle sophisticated data analysis and build more efficient applications.
Remember that with great power comes great responsibilityโalways test complex queries thoroughly and consider their performance impact on production systems. The key to SQL mastery is practice and understanding when to apply each technique for maximum effectiveness.