Search test library by skills or roles
⌘ K
SQL Coding interview questions for freshers
1. Imagine you have a box (table) of toys (data) with labels (columns) like 'toy_name' and 'color'. How do you list all the toys that are blue?
2. You have two boxes of toys: one with toy names and another with toy prices. How do you combine them to see each toy's name and its price in one list?
3. If you want to know how many toys of each color you have, how would you find that out?
4. Suppose you have a list of students and their scores. How would you find the student with the highest score?
5. Let's say you have a table of orders. How would you retrieve all orders placed in the last week?
6. You have a list of products. How do you find all products that cost more than $20?
7. If you have a table of customers and their addresses, how do you find all customers who live in 'New York'?
8. Given a table of employees, how do you find the average salary of all employees?
9. Suppose you have a table of songs and their artists. How do you list all the songs by a specific artist?
10. Imagine you need to find customers whose names start with the letter 'A'. How do you do it?
11. If you have a table of books and their authors, how do you count how many books each author has written?
12. Given a table of students and their grades, how do you find all students who passed (grade >= 60)?
13. Let's say you have a table of events and their dates. How do you find all events that happened in January?
14. You have a list of movies. How do you find all movies that are longer than 2 hours?
15. If you have a table of users and their registration dates, how do you find the oldest user?
16. Suppose you have a table of products and their categories. How would you retrieve all products from a specific category, like 'Electronics'?
17. Imagine you need to update the price of a specific product in your product table. How do you do it?
18. If you have a table of employees and you want to give a 10% raise to everyone, how would you update their salaries?
19. Let's say you need to remove a specific order from your orders table. How do you delete that order?
20. You have a table of users. How do you find the names of users without any duplicates?
21. You have two tables: one with customer information and another with order details. How do you find the customers who have not placed any orders?
22. If you have a table of blog posts, how do you retrieve the most recent five posts?
23. Given a table of employees, how do you find the second highest salary?
24. Suppose you have a table of transactions. How would you calculate a running total of all transactions based on their dates?
25. If you have a table of products and their prices, how do you find the product with the lowest price?
26. Given two tables representing students and courses, respectively, write a SQL query to find the names of all students who are enrolled in a particular course.
27. Suppose you have an 'employee' table. Write a SQL query to retrieve all employees whose salary is above the average salary of all employees. Also, order the result by salary in descending order.
SQL Coding interview questions for juniors
1. Imagine you have a table of students and their test scores. How do you find the student with the highest score?
2. If you have a table of products with their prices, how do you find the average price of all products?
3. You've got two tables: one with customer info and another with their orders. How do you combine them to see each customer's information alongside their orders?
4. Let's say you have a table of employees and their salaries. How do you list employees who earn more than a specific amount?
5. If you have a table of items and their categories, how do you count how many items are in each category?
6. Suppose you have a table of transactions with dates. How do you find all transactions that happened within a certain date range?
7. You have a table showing which students are enrolled in which courses. How do you find all students enrolled in a specific course?
8. If you have a table of blog posts, how would you retrieve the most recent post?
9. Imagine a table of songs and artists. How do you find all songs by a specific artist?
10. You have a table of customer support tickets with their status. How would you count the number of open tickets?
11. Consider a table of books with their titles and authors. How do you find books with a specific word in their title?
12. If you have a table tracking website visits, how do you find the number of visits from a specific country?
13. How do you select all the unique values from a column in a table?
14. If you wanted to sort the results of a query, how would you do it?
15. Explain how to retrieve data from a table where a specific column is empty.
16. Suppose there is a table with 'name' and 'age' columns. How can you select all people older than 25, and sort them by name?
17. If you have a table of products with names and prices, how do you show only the top 5 most expensive items?
18. Imagine you have a table tracking users and when they last logged in. How do you find users who haven't logged in the last 30 days?
19. You have two tables: One with customer information and the other with order information. How would you find the customer name and total amount spent for each customer?
20. If you have a 'products' table with price information, how can you calculate the difference between the highest and lowest price?
21. There's a table containing user's sign up dates and their last login dates. How would you find users whose last login date is the same as their sign up date?
22. If you have a table of employees, how would you find the oldest employee in each department?
23. How do you select data from one table that does not exist in another table, based on a common column?
24. If you wanted to update multiple columns in a table at once, what SQL command would you use?
SQL Coding intermediate interview questions
1. Imagine you have a table of customer orders. How would you find the customer who placed the most orders this month?
2. Let's say you have two tables: one with employee info and another with their department info. How can you retrieve the employee's name along with their department name?
3. You have a table with product sales. How would you calculate the rolling average of sales over the last 7 days?
4. Given a table of user activity logs, how can you identify users who have been inactive for more than 30 days?
5. You have a table with transaction data, including transaction date and amount. How would you find the month with the highest total transaction amount?
6. Suppose you have a table of students and their exam scores. How can you determine the rank of each student based on their score?
7. If you have a table of website traffic data, how would you identify the top 5 most visited pages?
8. Let's say you have a table of employees with their salaries. How can you calculate the difference between each employee's salary and the average salary?
9. Given a table of customer data, how would you segment customers based on their spending habits (e.g., low, medium, high spenders)?
10. You have a table with product inventory levels. How would you identify products that are below a certain reorder point?
11. Suppose you have a table of events with start and end times. How can you find events that overlap in time?
12. If you have a table of sales data with date and region, how would you calculate the year-over-year growth for each region?
13. Imagine a table of user subscriptions with start and end dates. How do you find active subscriptions on a specific date?
14. You have a table of items and their categories. Find the category with the most number of items.
15. How can you retrieve records from one table that don't exist in another, based on a common field?
16. Given a table of employees with hierarchy (manager/employee relationship), how do you find all direct and indirect reports of a given manager?
17. You have sales data with order date and amount. How would you calculate the cumulative sales for each day?
18. Suppose you have a table of users with signup date. How do you calculate the number of new users who signed up each month?
19. If you're given a table of articles with view counts. How can you find the articles with view counts greater than the average view count?
20. You have a table with customer ID and purchase date. Determine the number of customers who made their first purchase in the last quarter.
21. Imagine you're given a log table with timestamps. How can you identify consecutive failed login attempts from the same IP address?
22. Consider that you have a table with appointment schedules. How can you find available time slots between appointments?
23. You have a table of survey responses. How would you compute the percentage of respondents who answered a specific question in a certain way?
24. If you have a table of customer reviews. How would you find the top 5 most frequently used words in the reviews?
25. Let's say you have a table of cities and their populations. How would you find the median population across all cities?
26. Suppose you have a table of stock prices. How would you identify the days with the largest price fluctuations?
27. You have a table with products and discounts. Identify the products with discounts greater than 20% compared to the average price of similar products.
28. Given two tables, one with user information and another with subscription details, how would you identify users whose subscription is about to expire within the next 7 days?
SQL Coding interview questions for experienced
1. How would you optimize a slow-running query without adding indexes?
2. Describe your experience with different SQL dialects (e.g., MySQL, PostgreSQL, SQL Server) and how you adapt your code for each.
3. Explain how you would design a database schema for an e-commerce platform, considering scalability and performance.
4. How do you handle data migration between different database systems?
5. Describe a time you had to troubleshoot a deadlock situation in SQL Server. What were the steps you took?
6. Explain how you would implement a custom windowing function if it wasn't available in your SQL dialect.
7. How would you ensure data consistency when performing multiple updates across related tables?
8. Describe your experience with performance tuning SQL queries using execution plans.
9. How do you approach identifying and resolving performance bottlenecks in a database system?
10. Explain how you would design a system to prevent SQL injection attacks.
11. Describe your experience with database replication and its benefits.
12. How do you monitor database performance and identify potential issues proactively?
13. Explain how you would optimize a database for read-heavy vs. write-heavy workloads.
14. Describe your experience with different types of database backups and recovery strategies.
15. How would you implement a system to track changes to data over time (e.g., auditing or versioning)?
16. Explain how you would handle large data sets (e.g., millions or billions of rows) in SQL queries.
17. Describe your experience with using Common Table Expressions (CTEs) for complex queries.
18. How would you implement a full-text search functionality in your database?
19. Explain how you would design a data warehouse schema for business intelligence reporting.
20. Describe your experience with database security best practices.
21. How do you approach testing SQL code to ensure its correctness and performance?
22. Explain the trade-offs between different isolation levels in SQL transactions.
23. Describe a time you had to debug a complex stored procedure. What tools and techniques did you use?
24. How would you design a system for handling user authentication and authorization in a database application?
25. Explain how you would use partitioning to improve query performance on a large table.
26. Describe your experience with using database triggers and their potential impact on performance.

131 SQL Interview Questions to Hire Top Engineers


Siddhartha Gunti Siddhartha Gunti

September 09, 2024


SQL skills are often a non-negotiable requirement for many roles, like data analysts and database administrators. Interviewers need to ask SQL questions to check if candidates can work with databases.

This blog post presents a curated list of SQL coding interview questions categorized by experience level: freshers, juniors, intermediates, and experienced professionals; we also have a section on MCQs. This variety helps recruiters and hiring managers assess candidates across different stages of their SQL expertise.

By using these questions, you can confidently gauge a candidate's practical SQL abilities before they even begin the interview; for a standardized approach, consider using an SQL Online Test to quickly filter candidates.

Table of contents

SQL Coding interview questions for freshers
SQL Coding interview questions for juniors
SQL Coding intermediate interview questions
SQL Coding interview questions for experienced
SQL Coding MCQ
Which SQL Coding skills should you evaluate during the interview phase?
3 Tips for Maximizing Your SQL Coding Interview Questions
Hire Top SQL Talent with Skills Tests and Targeted Interview Questions
Download SQL Coding interview questions template in multiple formats

SQL Coding interview questions for freshers

1. Imagine you have a box (table) of toys (data) with labels (columns) like 'toy_name' and 'color'. How do you list all the toys that are blue?

To list all the blue toys, you would use a SELECT statement with a WHERE clause to filter the results. Specifically, you'd query the 'toys' table and specify that the 'color' column must equal 'blue'.

SELECT toy_name FROM toys WHERE color = 'blue';

2. You have two boxes of toys: one with toy names and another with toy prices. How do you combine them to see each toy's name and its price in one list?

Assuming the toy names and prices are in separate lists or data structures (like arrays or dictionaries), the simplest way to combine them is to iterate through one list and use the index to access the corresponding element in the other list. For example, if you're using Python:

toy_names = ["Teddy Bear", "Toy Car", "Doll"]
toy_prices = [10.00, 5.00, 12.00]

for i in range(len(toy_names)):
  print(f"{toy_names[i]}: ${toy_prices[i]}")

If toy names have different prices, you would want to store this in dictionaries/maps with the names as the keys and the prices as values.

3. If you want to know how many toys of each color you have, how would you find that out?

The simplest approach is to manually count the toys, grouping them by color as you go. You could create a tally sheet or a simple table, listing each color and marking each toy as you count. Alternatively, if you have a lot of toys or need a more automated solution, you could use a script or program to analyze images of the toys and identify the color of each one, counting and categorizing them accordingly. This would require some image processing and potentially machine learning, but could be faster and more accurate for large datasets.

4. Suppose you have a list of students and their scores. How would you find the student with the highest score?

To find the student with the highest score, you can iterate through the list of students and their scores, keeping track of the highest score encountered so far and the corresponding student. Start by assuming the first student has the highest score. Then, for each subsequent student, compare their score to the current highest score. If the student's score is higher, update the highest score and the corresponding student.

In Python:

def find_highest_score(student_scores):
    if not student_scores:
      return None, None

    highest_student = student_scores[0][0]
    highest_score = student_scores[0][1]

    for student, score in student_scores:
        if score > highest_score:
            highest_score = score
            highest_student = student

    return highest_student, highest_score

5. Let's say you have a table of orders. How would you retrieve all orders placed in the last week?

To retrieve all orders placed in the last week, I would use a SELECT statement with a WHERE clause to filter the orders based on their order date. Assuming the table is named orders and the order date column is named order_date, the query would look something like this:

SELECT * 
FROM orders
WHERE order_date >= DATE('now', '-7 days');

This query selects all columns from the orders table where the order_date is greater than or equal to a date that is 7 days prior to the current date. The specific function for getting the current date and subtracting days might vary depending on the database system (e.g., NOW() in MySQL, CURRENT_DATE in PostgreSQL, GETDATE() in SQL Server), but the core logic remains the same.

6. You have a list of products. How do you find all products that cost more than $20?

To find products costing more than $20, I would iterate through the list of products and check the price of each product. If the price is greater than $20, I would add that product to a new list of "expensive products".

For example, in Python:

expensive_products = [product for product in products if product['price'] > 20]

This list comprehension efficiently filters the products list and creates a new list containing only the products that meet the specified price condition.

7. If you have a table of customers and their addresses, how do you find all customers who live in 'New York'?

To find all customers who live in 'New York' from a table of customers and their addresses, you would use a SQL query. Assuming the table is named Customers and has columns CustomerID, Name, and Address, the query would look like this:

SELECT CustomerID, Name, Address
FROM Customers
WHERE Address LIKE '%New York%';

This query selects the CustomerID, Name, and Address columns from the Customers table where the Address column contains 'New York'. The LIKE operator with % wildcards allows for variations in the address format (e.g., '123 Main St, New York, NY 10001').

8. Given a table of employees, how do you find the average salary of all employees?

To find the average salary of all employees, you would use the AVG aggregate function in SQL. Assuming your table is named employees and the salary column is named salary, the query would look like this:

SELECT AVG(salary) AS average_salary
FROM employees;

This query calculates the average value of the salary column from the employees table and aliases the result as average_salary.

9. Suppose you have a table of songs and their artists. How do you list all the songs by a specific artist?

To list all songs by a specific artist, you would use a SELECT statement with a WHERE clause in SQL. Assuming the table is named songs and has columns title and artist, the query would look like this:

SELECT title
FROM songs
WHERE artist = 'Artist Name';

Replace 'Artist Name' with the actual name of the artist you're interested in. This query will return a list of all song titles where the artist matches the specified name.

10. Imagine you need to find customers whose names start with the letter 'A'. How do you do it?

To find customers whose names start with the letter 'A', you would typically use a database query or a programming function that filters records based on a string comparison. For example, in SQL, you might use a query like SELECT * FROM Customers WHERE name LIKE 'A%';. In a programming language like Python, you could iterate through a list of customer names and use the startswith() method to identify names beginning with 'A'.

Here is an example in Python:

customers = ["Alice Smith", "Bob Johnson", "Amy Lee", "David Brown"]
for customer in customers:
    if customer.startswith("A"):
        print(customer)

This code iterates through the customer list and prints those whose names begin with 'A'.

11. If you have a table of books and their authors, how do you count how many books each author has written?

To count the number of books each author has written, you would use a GROUP BY clause in SQL. You group the rows by author and then use the COUNT() aggregate function to count the number of books for each author.

Here's an example SQL query:

SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author;

This query selects the author and the count of books, aliased as book_count, from the books table. The GROUP BY author clause groups the rows by author, so the COUNT(*) function counts the number of books for each distinct author.

12. Given a table of students and their grades, how do you find all students who passed (grade >= 60)?

To find all students who passed (grade >= 60) from a table, you would use a SELECT query with a WHERE clause. The WHERE clause filters the results to only include rows where the grade is greater than or equal to 60. The following SQL statement does this:

SELECT * FROM students WHERE grade >= 60;

This query selects all columns (*) from the students table but only includes rows where the grade column is greater than or equal to 60.

13. Let's say you have a table of events and their dates. How do you find all events that happened in January?

To find all events that happened in January from a table of events and their dates, you would typically use a SQL query. Assuming your table is named events and has a column named event_date of DATE or DATETIME type, the query would look like this:

SELECT * FROM events WHERE MONTH(event_date) = 1;

This query extracts the month number from the event_date column and filters for those records where the month is equal to 1 (January).

14. You have a list of movies. How do you find all movies that are longer than 2 hours?

To find movies longer than 2 hours, you would iterate through the list of movies. For each movie, check its duration. If the duration is greater than 2 hours (120 minutes), add it to a new list of movies that meet the criteria.

Here's a general example (assuming each movie object has a 'duration' field representing duration in minutes):

long_movies = []
for movie in movies:
    if movie.duration > 120:
        long_movies.append(movie)

15. If you have a table of users and their registration dates, how do you find the oldest user?

To find the oldest user, you would typically query the table ordered by the registration date in ascending order. The first record returned would represent the oldest user.

Using SQL, you could achieve this with the following query:

SELECT * FROM users ORDER BY registration_date ASC LIMIT 1;

This query sorts the users table by the registration_date column in ascending order (oldest first) and then limits the result set to the first row, effectively returning the oldest user's information.

16. Suppose you have a table of products and their categories. How would you retrieve all products from a specific category, like 'Electronics'?

To retrieve all products from the 'Electronics' category, you would use a SQL query with a WHERE clause to filter the results based on the category. Assuming your table is named products and has columns product_name and category, the query would look like this:

SELECT product_name
FROM products
WHERE category = 'Electronics';

This query selects the product_name from the products table where the category column is equal to 'Electronics'. You can easily modify the SELECT statement to retrieve other columns as needed, such as price or description.

17. Imagine you need to update the price of a specific product in your product table. How do you do it?

To update the price of a specific product in a product table, I would use an UPDATE SQL statement. The statement would specify the table name, the column to be updated (price), the new price value, and a WHERE clause to identify the specific product to update.

For example:

UPDATE products
SET price = 25.99
WHERE product_id = 123;

This statement updates the price column in the products table to 25.99 for the product with a product_id of 123.

18. If you have a table of employees and you want to give a 10% raise to everyone, how would you update their salaries?

To give a 10% raise to all employees, you would use an UPDATE statement in SQL. Assuming the table is named employees and the salary column is named salary, the query would look like this:

UPDATE employees
SET salary = salary * 1.10;

This statement multiplies each employee's current salary by 1.10 (which is equivalent to increasing it by 10%), effectively giving them the raise. Be sure to back up your data or test in a development environment first!

19. Let's say you need to remove a specific order from your orders table. How do you delete that order?

To delete a specific order from the orders table, I would use the DELETE SQL statement. The basic syntax is:

DELETE FROM orders WHERE order_id = 'specific_order_id';

Replace 'specific_order_id' with the actual ID of the order you want to remove. It's crucial to include a WHERE clause to avoid accidentally deleting all rows in the table. I would also double-check the order_id before executing to ensure the correct order is being deleted. Additionally, before deleting, I would verify there are no foreign key constraints in other tables referencing this order, or handle those constraints appropriately (e.g., cascading deletes or setting foreign keys to NULL).

20. You have a table of users. How do you find the names of users without any duplicates?

To find the names of users without any duplicates from a table, you can use the DISTINCT keyword in SQL. For example, if your table is named users and the column containing names is name, the query would be:

SELECT DISTINCT name FROM users;

This query will return a list of all unique names from the users table, eliminating any duplicates.

21. You have two tables: one with customer information and another with order details. How do you find the customers who have not placed any orders?

To find customers who haven't placed any orders, you can use a LEFT JOIN or a NOT IN / NOT EXISTS clause. The LEFT JOIN approach joins the Customers table with the Orders table on the customer ID. Then, you filter for rows where the order ID is NULL, indicating no matching order.

Here's an example SQL query using LEFT JOIN:

SELECT c.* 
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

22. If you have a table of blog posts, how do you retrieve the most recent five posts?

To retrieve the five most recent blog posts, you would use a SQL query with an ORDER BY clause to sort the posts by a timestamp column (e.g., created_at or published_at) in descending order, and a LIMIT clause to restrict the result set to the top five rows.

SELECT * FROM blog_posts ORDER BY published_at DESC LIMIT 5;

23. Given a table of employees, how do you find the second highest salary?

To find the second highest salary in a table of employees, you can use SQL. Here's a common approach:

SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);

This SQL query first finds the maximum salary using a subquery (SELECT MAX(salary) FROM employees). Then, it selects the maximum salary from the employees whose salary is less than the maximum salary found in the subquery. This effectively gives you the second highest salary. Another approach using DENSE_RANK() or ROW_NUMBER() window functions can be useful when the table contains duplicate salaries and you want to retrieve the next unique highest value.

24. Suppose you have a table of transactions. How would you calculate a running total of all transactions based on their dates?

To calculate a running total of transactions based on their dates, I would typically use a window function in SQL. Specifically, I'd use the SUM() function with the OVER() clause to create a window that orders the transactions by date and cumulatively sums the transaction amounts. The ORDER BY clause within the OVER() function is crucial for defining the order in which the running total is calculated.

For example, if the table is named transactions with columns transaction_date and amount, the SQL query would look like this:

SELECT
    transaction_date,
    amount,
    SUM(amount) OVER (ORDER BY transaction_date) AS running_total
FROM
    transactions;

This query would return each transaction along with its corresponding running total based on the transaction date.

25. If you have a table of products and their prices, how do you find the product with the lowest price?

To find the product with the lowest price, you'd typically use a SQL query like this:

SELECT product_name, price
FROM products
ORDER BY price ASC
LIMIT 1;

This query sorts the products table by the price column in ascending order and then uses LIMIT 1 to select only the first row, which represents the product with the lowest price. You would then access the product_name and price columns of the resulting row.

26. Given two tables representing students and courses, respectively, write a SQL query to find the names of all students who are enrolled in a particular course.

Assuming we have two tables, Students with columns like student_id and student_name, and Enrollments (or Courses) with columns like course_id, course_name, and a foreign key student_id linking back to the Students table, the SQL query would look like this:

SELECT s.student_name
FROM Students s
JOIN Enrollments e ON s.student_id = e.student_id
WHERE e.course_name = 'Specific Course Name';

Replace 'Specific Course Name' with the actual name of the course you're interested in. The query joins the Students and Enrollments tables on the student_id, and then filters the results to only include students enrolled in the specified course, finally selecting the student's name.

27. Suppose you have an 'employee' table. Write a SQL query to retrieve all employees whose salary is above the average salary of all employees. Also, order the result by salary in descending order.

SELECT * 
FROM employee
WHERE salary > (SELECT AVG(salary) FROM employee)
ORDER BY salary DESC;

This SQL query first calculates the average salary of all employees using a subquery: (SELECT AVG(salary) FROM employee). Then, it selects all columns (*) from the employee table where the employee's salary is greater than the calculated average salary. Finally, the result set is ordered in descending order based on the salary column, so the highest-paid employees appear first.

SQL Coding interview questions for juniors

1. Imagine you have a table of students and their test scores. How do you find the student with the highest score?

To find the student with the highest score, you would typically use SQL. Assuming the table is named Students and has columns StudentID and Score, the query would be:

SELECT StudentID FROM Students ORDER BY Score DESC LIMIT 1;

This query orders the students by score in descending order and then selects only the first student (the one with the highest score). If you need the student's name or other details, you could join this result with another table containing student information using the StudentID.

2. If you have a table of products with their prices, how do you find the average price of all products?

To find the average price of all products in a table, you would sum up the prices of all the products and then divide by the total number of products. For example, using SQL:

SELECT AVG(price) FROM products;

This query directly calculates the average price from the 'price' column of the 'products' table using the AVG() function.

3. You've got two tables: one with customer info and another with their orders. How do you combine them to see each customer's information alongside their orders?

To combine customer information and order details from two tables, I would use a JOIN operation in SQL. Assuming we have a customers table with columns like customer_id, name, and email, and an orders table with columns like order_id, customer_id, and order_date, I'd use the customer_id column as the common key to link the tables.

A typical SQL query would look like this:

SELECT
  customers.name,
  customers.email,
  orders.order_id,
  orders.order_date
FROM
  customers
JOIN
  orders ON customers.customer_id = orders.customer_id;

This query retrieves the customer's name, email, order ID, and order date by joining the tables based on the matching customer_id. An INNER JOIN is implied here, showing only customers with orders. Other types of joins (e.g., LEFT JOIN) could be used to include all customers, even those without any orders.

4. Let's say you have a table of employees and their salaries. How do you list employees who earn more than a specific amount?

To list employees who earn more than a specific amount from a table named employees with columns employee_id, employee_name, and salary, you would typically use a SQL query like this:

SELECT employee_name, salary
FROM employees
WHERE salary > specific_amount;

Replace specific_amount with the actual salary threshold you want to use. For example, to find employees earning more than $60,000, you'd use WHERE salary > 60000.

5. If you have a table of items and their categories, how do you count how many items are in each category?

To count the number of items in each category in a table, you would typically use a GROUP BY clause in SQL. You also need to use an aggregate function such as COUNT().

For example, if your table is named items and has columns item_name and category, the query would look like this:

SELECT category, COUNT(*) AS item_count
FROM items
GROUP BY category;

This query selects the category and the count of items in each category, aliased as item_count. The GROUP BY clause groups the rows by category, allowing the COUNT(*) function to count the number of items within each category.

6. Suppose you have a table of transactions with dates. How do you find all transactions that happened within a certain date range?

To find all transactions within a specific date range, you'd typically use a WHERE clause in your SQL query to filter the results based on the date column. The specific syntax would depend on the database system you're using, but a common pattern involves the BETWEEN operator or comparison operators.

For example, in standard SQL, you might use a query like:

SELECT * 
FROM transactions
WHERE date_column BETWEEN '2023-01-01' AND '2023-01-31';

Alternatively, you could use comparison operators:

SELECT * 
FROM transactions
WHERE date_column >= '2023-01-01' AND date_column <= '2023-01-31';

Make sure to replace transactions with the actual table name and date_column with the name of the column containing the transaction dates. Also, ensure the date format in your query matches the format used in your database.

7. You have a table showing which students are enrolled in which courses. How do you find all students enrolled in a specific course?

Assuming the table has columns like student_id and course_id, you can find all students enrolled in a specific course using a simple SQL query. Replace 'specific_course' with the actual course ID or name.

SELECT student_id
FROM enrollment_table
WHERE course_id = 'specific_course';

This query selects all student_id entries from the enrollment_table where the course_id matches the specified course. The result will be a list of student IDs enrolled in that course.

8. If you have a table of blog posts, how would you retrieve the most recent post?

To retrieve the most recent blog post, I would use a query that orders the posts by a timestamp column (e.g., created_at or published_at) in descending order and then limit the result to one row.

SELECT * FROM blog_posts ORDER BY created_at DESC LIMIT 1;

This SQL query selects all columns from the blog_posts table, orders the results by the created_at column in descending order (so the newest posts are first), and then uses LIMIT 1 to retrieve only the first row, which represents the most recent post.

9. Imagine a table of songs and artists. How do you find all songs by a specific artist?

To find all songs by a specific artist, you would use a SELECT query with a WHERE clause. Assuming the table is named songs, and it has columns song_name and artist_name, the SQL query would look like this:

SELECT song_name FROM songs WHERE artist_name = 'Specific Artist Name';

Replace 'Specific Artist Name' with the actual name of the artist you're looking for. This query selects the song_name from the songs table where the artist_name matches the specified artist.

10. You have a table of customer support tickets with their status. How would you count the number of open tickets?

To count the number of open tickets, I would use a SQL query with a WHERE clause to filter the tickets based on their status. Assuming the table is named support_tickets and the status column is named status, the query would look like this:

SELECT COUNT(*) 
FROM support_tickets
WHERE status = 'open';

This query selects all rows from the support_tickets table where the status is 'open', and then uses the COUNT(*) function to count the number of rows that meet this condition. The result is the total number of open tickets.

11. Consider a table of books with their titles and authors. How do you find books with a specific word in their title?

To find books with a specific word in their title, you can use a SQL query with the LIKE operator. Assuming your table is named books with columns title and author, the query would look something like this:

SELECT * FROM books WHERE title LIKE '%keyword%';

Replace keyword with the actual word you're searching for. The % symbols are wildcards, meaning they match any sequence of characters (including zero characters). This will return all rows where the title contains the specified keyword anywhere within it.

12. If you have a table tracking website visits, how do you find the number of visits from a specific country?

Assuming the table has columns like visit_id, visit_time, and country, you can use a SQL query to count the visits from a specific country. For example, to find the number of visits from 'USA', you can use the following query:

SELECT COUNT(*) 
FROM website_visits
WHERE country = 'USA';

This query filters the website_visits table to only include rows where the country column is equal to 'USA' and then counts the number of rows that match this criteria.

13. How do you select all the unique values from a column in a table?

To select all the unique values from a column in a table, you would use the DISTINCT keyword in SQL.

For example, if you want to select all the unique values from a column named 'category' in a table named 'products', the SQL query would be:

SELECT DISTINCT category FROM products;

14. If you wanted to sort the results of a query, how would you do it?

To sort the results of a query, I would use the ORDER BY clause in SQL. This clause allows you to specify one or more columns by which you want to sort the result set. By default, ORDER BY sorts in ascending order. To sort in descending order, you would use the DESC keyword after the column name.

For example:

SELECT * FROM employees ORDER BY salary DESC, last_name ASC;

This query would sort the employees table first by salary in descending order (highest salary first), and then within each salary group, it would sort by last_name in ascending order (alphabetical order).

15. Explain how to retrieve data from a table where a specific column is empty.

To retrieve data from a table where a specific column is empty, you typically use the IS NULL operator in your SQL query. The exact syntax might vary slightly depending on the database system you're using (e.g., MySQL, PostgreSQL, SQL Server), but the core concept remains the same.

For example, if you want to retrieve all rows from a table named employees where the department column is empty (NULL), you would use the following SQL query:

SELECT * FROM employees WHERE department IS NULL;

If the column contains an empty string instead of a NULL value (which is different), you'd use WHERE department = '' instead.

16. Suppose there is a table with 'name' and 'age' columns. How can you select all people older than 25, and sort them by name?

To select all people older than 25 and sort them by name, you would use the following SQL query:

SELECT name, age
FROM table_name
WHERE age > 25
ORDER BY name;

This query first selects the 'name' and 'age' columns from your table. Then, the WHERE clause filters the results to include only those rows where the 'age' is greater than 25. Finally, the ORDER BY clause sorts the resulting rows alphabetically by the 'name' column.

17. If you have a table of products with names and prices, how do you show only the top 5 most expensive items?

To show only the top 5 most expensive items from a table of products with names and prices, you would typically use SQL. The query would involve ordering the table by price in descending order and then limiting the result set to the top 5 rows.

For example, in SQL, you might use the following query:

SELECT product_name, price
FROM products
ORDER BY price DESC
LIMIT 5;

This query selects the product name and price, orders the products table by the price column in descending order (highest price first), and then uses LIMIT 5 to retrieve only the top 5 rows.

18. Imagine you have a table tracking users and when they last logged in. How do you find users who haven't logged in the last 30 days?

Assuming your table is named users and has a column named last_login, you can use the following SQL query to find users who haven't logged in the last 30 days:

SELECT * FROM users WHERE last_login < DATE('now', '-30 days');

This query selects all columns from the users table where the last_login date is earlier than 30 days ago from the current date. The DATE('now', '-30 days') function calculates the date 30 days in the past. This will return users who have a last_login timestamp older than 30 days, effectively identifying those who haven't logged in during that period.

19. You have two tables: One with customer information and the other with order information. How would you find the customer name and total amount spent for each customer?

To find the customer name and total amount spent for each customer, you would typically use a JOIN operation combined with aggregation. Assuming the customer table has columns like customer_id and customer_name, and the order table has columns like customer_id and order_amount, you would join these tables on the customer_id. After joining, you'd group the results by customer name and calculate the sum of the order amounts for each customer.

The SQL query would look something like this:

SELECT c.customer_name, SUM(o.order_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;

20. If you have a 'products' table with price information, how can you calculate the difference between the highest and lowest price?

To calculate the difference between the highest and lowest price in a 'products' table, you can use the following SQL query:

SELECT MAX(price) - MIN(price) AS price_difference FROM products;

This query uses the MAX() and MIN() aggregate functions to find the maximum and minimum prices, respectively. Then, it subtracts the minimum price from the maximum price to calculate the difference, aliasing the result as price_difference.

21. There's a table containing user's sign up dates and their last login dates. How would you find users whose last login date is the same as their sign up date?

To find users whose last login date is the same as their sign-up date, you would typically use a SQL query. Assuming your table is named users and has columns signup_date and last_login, the query would look like this:

SELECT user_id
FROM users
WHERE signup_date = last_login;

This query selects the user_id (or any other relevant user identifier) from the users table where the signup_date column is equal to the last_login column. The dates must be of a compatible data type for the comparison to work correctly.

22. If you have a table of employees, how would you find the oldest employee in each department?

To find the oldest employee in each department, I would use SQL. Assuming the table is named employees with columns department and date_of_birth, the query would be something like:

SELECT department, MIN(date_of_birth) AS oldest_dob
FROM employees
GROUP BY department;

To get the complete employee record, you can use a subquery or a window function like ROW_NUMBER() to partition by department and order by date of birth. This will assign rank '1' to the oldest employee in each department.

23. How do you select data from one table that does not exist in another table, based on a common column?

To select data from one table that does not exist in another table based on a common column, you can use several SQL techniques, including NOT IN, NOT EXISTS, and LEFT JOIN with WHERE IS NULL. The LEFT JOIN approach is often the most performant, especially for large tables.

For example, if you have tables table1 and table2 with a common column id, the query using LEFT JOIN would look like this:

SELECT table1.* 
FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL;

This query returns all rows from table1 where there is no matching id in table2.

24. If you wanted to update multiple columns in a table at once, what SQL command would you use?

To update multiple columns in a table simultaneously, you would use the UPDATE command in SQL.

The UPDATE statement allows you to modify the values of one or more columns in a table for a specified set of rows. The SET clause is used to specify which columns to update and what their new values should be. You can include multiple column-value assignments separated by commas in the SET clause. For example: UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;

SQL Coding intermediate interview questions

1. Imagine you have a table of customer orders. How would you find the customer who placed the most orders this month?

To find the customer who placed the most orders this month, I would use SQL. Assuming the table is named orders and has columns like customer_id and order_date, the query would look something like this:

SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE)
AND order_date < DATE_TRUNC('month', CURRENT_DATE) + INTERVAL '1 month'
GROUP BY customer_id
ORDER BY order_count DESC
LIMIT 1;

This query first filters orders to include only those placed this month. Then, it groups the orders by customer_id and counts the number of orders for each customer. Finally, it orders the results by the order count in descending order and limits the result to the top row, giving the customer with the most orders.

2. Let's say you have two tables: one with employee info and another with their department info. How can you retrieve the employee's name along with their department name?

To retrieve an employee's name along with their department name, you would typically use a JOIN operation in SQL. Assuming both tables share a common column, like department_id, you can link rows based on this shared identifier.

Here's an example SQL query:

SELECT e.employee_name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

This query selects the employee_name from the employees table (aliased as e) and the department_name from the departments table (aliased as d). The JOIN clause ON e.department_id = d.department_id specifies that rows should be combined where the department_id values match in both tables. This will provide the employee name alongside their corresponding department name in the result set.

3. You have a table with product sales. How would you calculate the rolling average of sales over the last 7 days?

To calculate the rolling average of sales over the last 7 days, I would use a window function in SQL. Assuming the table has columns like sale_date and sales_amount, the query would look something like this:

SELECT sale_date,
       AVG(sales_amount) OVER (ORDER BY sale_date ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_average_7_days
FROM sales_table;

This calculates the average sales_amount for each sale_date, considering the current row and the 6 preceding rows based on the sale_date order. This provides the 7-day rolling average. Ensure that the sale_date column is of date or timestamp type. Some databases might use slightly different syntax for window functions (e.g., OVER (ORDER BY sale_date ASC RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW) if sale_date is a timestamp). If there are missing dates, you might need to generate a series of dates and left join it with the sales data.

4. Given a table of user activity logs, how can you identify users who have been inactive for more than 30 days?

To identify users inactive for more than 30 days, you'd query the user activity logs table. Assuming the table has columns like user_id and last_activity_date, you can use SQL. The query would filter for users whose last_activity_date is older than 30 days from the current date.

SELECT user_id
FROM user_activity_logs
WHERE last_activity_date < DATE('now', '-30 days');

This SQL snippet calculates the date 30 days ago using DATE('now', '-30 days') and then selects the user_id of all users whose last_activity_date is before that calculated date. This provides the list of users inactive for the specified period.

5. You have a table with transaction data, including transaction date and amount. How would you find the month with the highest total transaction amount?

To find the month with the highest total transaction amount, you can use SQL. Assuming your table is named transactions with columns transaction_date (DATE type) and amount (numeric type), the following query would achieve this:

SELECT strftime('%Y-%m', transaction_date) AS transaction_month, SUM(amount) AS total_amount
FROM transactions
GROUP BY transaction_month
ORDER BY total_amount DESC
LIMIT 1;

This query first extracts the year and month from the transaction_date. Then it groups the transactions by month, calculates the sum of amounts for each month, and orders the result in descending order of the total amount. Finally, it uses LIMIT 1 to select only the month with the highest total transaction amount.

6. Suppose you have a table of students and their exam scores. How can you determine the rank of each student based on their score?

To determine the rank of each student based on their exam score, you can use SQL. Assuming the table is named students and has columns student_id and score, you can use window functions for ranking. The RANK() function is commonly used, which assigns ranks based on score with gaps for ties. DENSE_RANK() would assign consecutive ranks, even with ties.

Here's an example SQL query:

SELECT
    student_id,
    score,
    RANK() OVER (ORDER BY score DESC) AS student_rank
FROM
    students;

This query orders the students by score in descending order and assigns a rank to each student.

7. If you have a table of website traffic data, how would you identify the top 5 most visited pages?

To identify the top 5 most visited pages from a website traffic data table, I would use SQL (or a similar query language depending on the data storage system). Assuming the table has columns like page_url and visit_count (or a similar metric), the query would look something like this:

SELECT page_url
FROM website_traffic_table
ORDER BY visit_count DESC
LIMIT 5;

This query sorts the table by visit_count in descending order and then selects only the top 5 page_url entries, thus giving you the top 5 most visited pages. If the data contains individual visit records, you might need to GROUP BY page_url and use COUNT(*) to get the visit counts before ordering.

8. Let's say you have a table of employees with their salaries. How can you calculate the difference between each employee's salary and the average salary?

To calculate the difference between each employee's salary and the average salary, you would typically use SQL. First, calculate the average salary across all employees. Then, for each employee, subtract their individual salary from this average. This can be accomplished using a subquery or a window function in SQL.

Here's an example using a subquery:

SELECT employee_id, salary - (SELECT AVG(salary) FROM employees) AS salary_difference
FROM employees;

Alternatively, using a window function (which is generally more efficient):

SELECT employee_id, salary - AVG(salary) OVER () AS salary_difference
FROM employees;

Both queries achieve the same result: a table showing each employee's ID and the difference between their salary and the average salary.

9. Given a table of customer data, how would you segment customers based on their spending habits (e.g., low, medium, high spenders)?

To segment customers based on spending habits, I'd typically use a combination of quantitative analysis and business understanding. First, calculate metrics like total spending, average order value, and purchase frequency for each customer. Then, apply a segmentation technique like:

  • Quantile-based segmentation: Divide customers into groups (e.g., low, medium, high) based on quantiles of their total spending. For instance, the bottom 25% are low spenders, the next 50% are medium, and the top 25% are high. This can be easily done using SQL:

    SELECT
        customer_id,
        NTILE(3) OVER (ORDER BY total_spending) AS spending_segment
    FROM (
        SELECT
            customer_id,
            SUM(order_total) AS total_spending
        FROM
            orders
        GROUP BY
            customer_id
    ) AS customer_spending;
    
  • K-means clustering: Use a clustering algorithm like K-means on spending-related features to group customers into clusters. The resulting clusters represent different spending segments.

Finally, validate segments with business sense and adjust parameters as needed.

10. You have a table with product inventory levels. How would you identify products that are below a certain reorder point?

To identify products below a reorder point, I would use a SELECT statement with a WHERE clause. The WHERE clause would compare the current inventory level to the reorder point. For example:

SELECT product_id, product_name, inventory_level
FROM products
WHERE inventory_level < reorder_point;

This query would return all products where the inventory_level is less than the reorder_point. I would need to replace products, product_id, product_name, inventory_level, and reorder_point with the actual names of the table and columns in the database.

11. Suppose you have a table of events with start and end times. How can you find events that overlap in time?

To find overlapping events, you can use a self-join. Join the table to itself where the start time of one event is before the end time of the other event, and the end time of the first event is after the start time of the second event. This ensures that the time intervals intersect.

Here's a SQL example:

SELECT e1.*, e2.* 
FROM events e1 
JOIN events e2 ON e1.id != e2.id 
WHERE e1.start_time < e2.end_time AND e1.end_time > e2.start_time;

This query will return all pairs of events that have overlapping time ranges. The e1.id != e2.id condition prevents an event from being matched with itself.

12. If you have a table of sales data with date and region, how would you calculate the year-over-year growth for each region?

To calculate year-over-year (YoY) growth for each region, you'd typically use SQL or a data analysis tool like Python with Pandas. The core idea is to compare sales for a given region in the current year to the sales for the same region in the previous year. Here's a high-level approach:

First, you'll need to group your sales data by region and year. Then, for each region-year combination, calculate the total sales. Finally, to determine YoY growth, you would use a window function (like LAG() in SQL) or a similar technique in Pandas to access the previous year's sales for the same region. The YoY growth would be calculated as ((Current Year Sales - Previous Year Sales) / Previous Year Sales) * 100. This gives you the percentage change in sales compared to the prior year for each specific region.

13. Imagine a table of user subscriptions with start and end dates. How do you find active subscriptions on a specific date?

To find active subscriptions on a specific date, you would query the table to find records where the start date is before or equal to the specific date, AND the end date is after or equal to the specific date. This will include all subscriptions that were active on that particular date.

In SQL, it might look something like this:

SELECT * FROM subscriptions
WHERE start_date <= '2024-10-27' AND end_date >= '2024-10-27';

Replace '2024-10-27' with the specific date you want to query. If end_date can be NULL (representing an ongoing subscription), you would also add a condition to handle NULL values, like OR end_date IS NULL.

14. You have a table of items and their categories. Find the category with the most number of items.

To find the category with the most items, you can use SQL. Here's a query:

SELECT category FROM items GROUP BY category ORDER BY COUNT(*) DESC LIMIT 1;

This query groups the items by category, counts the number of items in each category using COUNT(*), orders the results in descending order based on the count, and then selects the top category using LIMIT 1.

15. How can you retrieve records from one table that don't exist in another, based on a common field?

You can retrieve records from one table that don't exist in another based on a common field using a LEFT JOIN or NOT EXISTS clause in SQL. A LEFT JOIN includes all records from the "left" table (the first table specified) and matching records from the "right" table (the second table). If there's no match in the right table, the columns from the right table will contain NULL.

For example, assuming tables table_a and table_b share a common field named id, the following SQL query retrieves records from table_a where id does not exist in table_b:

SELECT a.* 
FROM table_a a
LEFT JOIN table_b b ON a.id = b.id
WHERE b.id IS NULL;

Alternatively, you can use the NOT EXISTS clause:

SELECT a.*
FROM table_a a
WHERE NOT EXISTS (
    SELECT 1
    FROM table_b b
    WHERE a.id = b.id
);

16. Given a table of employees with hierarchy (manager/employee relationship), how do you find all direct and indirect reports of a given manager?

To find all direct and indirect reports of a given manager, you can use a recursive common table expression (CTE) in SQL. The CTE starts with the direct reports of the manager and then recursively joins the employee table to itself to find the reports of those reports, and so on, until all levels of the hierarchy are traversed.

Here's an example SQL query:

WITH RECURSIVE EmployeeHierarchy AS (
    SELECT employee_id, manager_id, employee_name
    FROM employees
    WHERE manager_id = 'given_manager_id' -- Replace with the actual manager ID
    UNION ALL
    SELECT e.employee_id, e.manager_id, e.employee_name
    FROM employees e
    INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT employee_id, employee_name
FROM EmployeeHierarchy;

This query returns a table of all employees who report directly or indirectly to the specified manager.

17. You have sales data with order date and amount. How would you calculate the cumulative sales for each day?

To calculate cumulative sales for each day, you would typically use a window function if you're working with SQL or a similar analytical tool. In SQL, you could use the SUM() OVER() clause. In pandas (Python), you would group by date and then use the cumsum() function.

For example, in pandas:

import pandas as pd

# Assuming your DataFrame is named 'sales_df' with columns 'order_date' and 'amount'
sales_df['cumulative_sales'] = sales_df.groupby('order_date')['amount'].cumsum()
print(sales_df)

This code snippet groups sales data by 'order_date' and calculates the cumulative sum of the 'amount' for each date.

18. Suppose you have a table of users with signup date. How do you calculate the number of new users who signed up each month?

To calculate the number of new users who signed up each month, you would typically use SQL. Assuming your table is named users and has a column named signup_date of type date or timestamp, the query would look like this:

SELECT
    DATE_TRUNC('month', signup_date) AS signup_month,
    COUNT(DISTINCT user_id) AS new_users
FROM
    users
GROUP BY
    signup_month
ORDER BY
    signup_month;

This query groups the users by the month of their signup date using DATE_TRUNC, and then counts the distinct user IDs (user_id) for each month. The result will give you the number of new users for each month. If you don't have a user_id, you could use any unique identifier such as email.

19. If you're given a table of articles with view counts. How can you find the articles with view counts greater than the average view count?

To find articles with view counts greater than the average view count, you can use a SQL query. First, calculate the average view count of all articles using the AVG() function. Then, select the articles where the view count is greater than this average.

Here's an example SQL query:

SELECT article_id, title, view_count
FROM articles
WHERE view_count > (SELECT AVG(view_count) FROM articles);

20. You have a table with customer ID and purchase date. Determine the number of customers who made their first purchase in the last quarter.

To determine the number of customers who made their first purchase in the last quarter, you would typically use SQL. Here's a general approach:

First, identify the last quarter's date range. Then, find the minimum (first) purchase date for each customer. Finally, count the number of customers whose minimum purchase date falls within the last quarter's date range. Here is a sample SQL:

WITH FirstPurchases AS (
    SELECT 
        customer_id,
        MIN(purchase_date) AS first_purchase_date
    FROM 
        purchases
    GROUP BY 
        customer_id
)

SELECT 
    COUNT(DISTINCT customer_id)
FROM 
    FirstPurchases
WHERE 
    first_purchase_date >= DATE('now', '-3 months') -- Start of last quarter
    AND first_purchase_date < DATE('now'); -- End of last quarter

21. Imagine you're given a log table with timestamps. How can you identify consecutive failed login attempts from the same IP address?

To identify consecutive failed login attempts from the same IP address, I would use a window function in SQL. Assuming the table login_attempts has columns like ip_address, timestamp, and login_status, the query would partition the data by ip_address and order it by timestamp. A LAG() function can then be used to check if the previous login attempt for the same IP address was also a failure. If both the current and previous attempts failed, it indicates a consecutive failure. This can be extended to identify longer sequences using more LAG() functions or recursive queries.

Specifically, the query would look something like this:

SELECT ip_address, timestamp
FROM (
    SELECT ip_address, timestamp, login_status,
           LAG(login_status, 1, 'success') OVER (PARTITION BY ip_address ORDER BY timestamp) AS previous_login_status
    FROM login_attempts
) AS subquery
WHERE login_status = 'failed' AND previous_login_status = 'failed';

22. Consider that you have a table with appointment schedules. How can you find available time slots between appointments?

To find available time slots between appointments in a table, you'll typically query the table to order the appointments by start time. Then, iterate through the sorted appointments, comparing the end time of one appointment with the start time of the next. The difference between these times represents a potential available slot.

Specifically, the logic works as follows:

  1. Sort the appointment records by start time.
  2. Initialize an empty list to store available time slots.
  3. Iterate through the sorted appointments.
  4. Calculate the gap between the end time of the current appointment and the start time of the next appointment.
  5. If the gap is greater than zero (meaning there's a time difference), add it to the list of available time slots. For example, in SQL, you could achieve this with LEAD() function or with application code which iterates over the sorted records to implement this logic.

23. You have a table of survey responses. How would you compute the percentage of respondents who answered a specific question in a certain way?

To compute the percentage of respondents who answered a specific question in a certain way, I would first count the number of respondents who gave that specific answer. Then, I would divide that count by the total number of respondents to the survey. Finally, I'd multiply the result by 100 to express it as a percentage. For example, using SQL:

SELECT (COUNT(CASE WHEN answer = 'specific_answer' THEN 1 END) * 100.0) / COUNT(*) AS percentage
FROM survey_responses
WHERE question_id = 'specific_question';

24. If you have a table of customer reviews. How would you find the top 5 most frequently used words in the reviews?

To find the top 5 most frequent words, I'd use SQL and some scripting. First, I'd extract all the reviews into a single text. Then, using a script (e.g., Python), I'd clean the text by removing punctuation and converting it to lowercase. Next, I'd split the text into words and count the frequency of each word, ignoring common stop words (like 'the', 'a', 'is'). Finally, I'd sort the words by frequency and select the top 5.

A SQL query can help with the initial extraction:

SELECT review_text FROM customer_reviews;

Then, a Python script using libraries like nltk or collections.Counter would handle cleaning, tokenizing, stop word removal, and frequency counting. The Counter object makes the frequency calculation straightforward, after which, you can use Counter.most_common(5) to find the most frequent words.

25. Let's say you have a table of cities and their populations. How would you find the median population across all cities?

Finding the median population depends on the database system. Many databases do not have a built-in median function. Therefore, you typically need to use window functions and ordering to calculate it. Here's a common approach using SQL:

In databases like PostgreSQL or SQL Server that do have a MEDIAN() function (or equivalent percentile functions), you can simply use SELECT MEDIAN(population) FROM cities; If not, you would use window functions to determine the row number and total count. Then, filter for the middle row(s). For example:

WITH OrderedCities AS (
 SELECT 
  city,
  population,
  ROW_NUMBER() OVER (ORDER BY population) AS row_num,
  COUNT(*) OVER () AS total_rows
 FROM 
  cities
)
SELECT 
 AVG(population)
FROM 
 OrderedCities
WHERE 
 row_num BETWEEN (total_rows + 1) / 2.0 AND (total_rows + 2) / 2.0;

This SQL query first orders the cities by population, assigns a row number to each, and calculates the total number of rows. Then, it averages the population of the middle one or two cities, depending on whether the total number of cities is odd or even, respectively. The BETWEEN clause handles both cases correctly.

26. Suppose you have a table of stock prices. How would you identify the days with the largest price fluctuations?

To identify the days with the largest price fluctuations in a stock prices table, I would calculate the price difference for each day. This difference could be the absolute value of (High - Low), (Closing Price - Opening Price), or the difference between consecutive closing prices depending on the specific requirement.

Then, I would rank these price differences to find the days with the largest values. SQL could be used to achieve this. For example:

SELECT Date, ABS(High - Low) AS PriceFluctuation
FROM StockPrices
ORDER BY PriceFluctuation DESC
LIMIT 10; -- Top 10 days with largest fluctuation

27. You have a table with products and discounts. Identify the products with discounts greater than 20% compared to the average price of similar products.

To identify products with discounts greater than 20% compared to the average price of similar products, you would typically use SQL. Assuming you have a table named products with columns like product_id, category, price, and discount, the following SQL query could achieve this:

SELECT product_id
FROM products p1
WHERE discount > 0.2 * (
    SELECT AVG(price)
    FROM products p2
    WHERE p1.category = p2.category
);

This query first calculates the average price for each category of products. Then, it selects the product_id of products where the discount is greater than 20% of the average price of products within the same category.

28. Given two tables, one with user information and another with subscription details, how would you identify users whose subscription is about to expire within the next 7 days?

To identify users whose subscriptions are expiring within the next 7 days, I would use a SQL query joining the user information table and the subscription details table. The query would filter based on the subscription end date, selecting only those records where the end_date is within the next 7 days from the current date.

Specifically, assuming the tables are named users and subscriptions with a common column user_id, the query would look something like this:

SELECT u.*, s.* 
FROM users u
JOIN subscriptions s ON u.user_id = s.user_id
WHERE s.end_date BETWEEN CURRENT_DATE AND DATE_ADD(CURRENT_DATE, INTERVAL 7 DAY);

This query selects all columns from both tables for users whose subscription end_date falls within the specified 7-day window. Note that the exact syntax of DATE_ADD might vary based on the specific database system (e.g., MySQL, PostgreSQL).

SQL Coding interview questions for experienced

1. How would you optimize a slow-running query without adding indexes?

Optimizing a slow query without adding indexes involves several strategies focused on rewriting the query itself and leveraging existing database features. First, analyze the query execution plan to pinpoint bottlenecks, like full table scans or inefficient joins. Rewrite the query to be more efficient; this may involve:

  • Simplifying complex WHERE clauses.
  • Using more efficient join algorithms (if the database allows control).
  • Breaking down large queries into smaller, more manageable ones.
  • Ensuring data types are consistent in comparisons to avoid implicit conversions.

Second, optimize data access patterns by ensuring the most selective filters are applied early in the query. Utilize temporary tables or common table expressions (CTEs) to pre-filter data and reduce the amount of data processed in subsequent steps. Finally, if possible, adjust database configuration parameters (e.g., increasing memory allocation for sorting) to improve performance, although this isn't strictly query optimization. Reviewing database statistics and updating them can help the query optimizer make better decisions based on the existing data distribution. Also consider caching if applicable. For example, using MATERIALIZE keyword in some database systems to store the result of a subquery in a temporary table.

2. Describe your experience with different SQL dialects (e.g., MySQL, PostgreSQL, SQL Server) and how you adapt your code for each.

I have experience working with several SQL dialects including MySQL, PostgreSQL, and SQL Server. My approach to adapting code for each dialect involves understanding their specific syntax and feature sets. For example, date/time functions, string manipulation functions, and even things like auto-incrementing column definitions can vary.

I typically utilize online documentation and resources specific to each database system. When writing SQL, I pay close attention to reserved keywords and data type handling. I try to use ANSI SQL standards where possible to improve portability. If a specific feature isn't supported by a particular dialect, I'll explore alternative approaches, sometimes using conditional logic within stored procedures (if supported) or by handling the logic in the application layer. For example, the syntax for limiting results differs:

  • MySQL/PostgreSQL: LIMIT 10
  • SQL Server: TOP 10. When writing CREATE TABLE statements, I pay careful attention to the specific data types available in each dialect.

3. Explain how you would design a database schema for an e-commerce platform, considering scalability and performance.

For an e-commerce platform, I'd design a relational database schema with tables like users, products, categories, orders, and order_items. The users table would store user information. products would hold product details, linked to categories for organization. orders would track order information, and order_items would link orders to specific products and quantities. To optimize for scalability and performance, I'd use techniques like database normalization to reduce redundancy, indexing frequently queried columns (e.g., product_id, user_id), and potentially consider database sharding or partitioning for very large datasets.

Considerations for performance include using appropriate data types (e.g., integers for IDs), employing caching mechanisms (e.g., Redis) for frequently accessed data, and optimizing SQL queries. For example, using EXPLAIN to analyze query performance and refactor slow queries. Furthermore, I'd implement proper transaction management to ensure data consistency and integrity, especially during order processing. Consider CREATE INDEX idx_product_category ON products (category_id);.

4. How do you handle data migration between different database systems?

Data migration between different database systems involves several steps. First, assess the source and target database schemas and identify any incompatibilities. Then, extract the data from the source database, often using tools like mysqldump, pg_dump, or specialized ETL tools. Transform the data to match the target database's schema and data types. This might involve data cleansing, format conversion, and resolving inconsistencies. Finally, load the transformed data into the target database, verifying data integrity and completeness.

Common strategies include using ETL (Extract, Transform, Load) tools, writing custom scripts (e.g., in Python with libraries like psycopg2 or SQLAlchemy), or using database-specific migration tools. Consider factors like data volume, downtime requirements, and data sensitivity when choosing a migration approach. Thorough testing and validation are crucial to ensure a successful migration.

5. Describe a time you had to troubleshoot a deadlock situation in SQL Server. What were the steps you took?

In a previous role, I encountered a deadlock issue in our production SQL Server database. The application would occasionally hang, and upon investigation, SQL Server's error log revealed deadlock errors. My first step was to identify the processes involved in the deadlock. I used SQL Server Profiler (or Extended Events) to capture the deadlock graph. This graph visually represents the resources involved and the order in which the processes attempted to acquire them. Analyzing the deadlock graph pointed to two stored procedures competing for resources (specifically, updating the same two tables but in opposite order). To resolve the deadlock, I modified one of the stored procedures to acquire locks on the tables in the same order as the other stored procedure. This ensured a consistent locking order, preventing the circular dependency that caused the deadlock. After deploying the fix, the deadlock errors disappeared, and the application's stability improved. Also, it's useful to set SET LOCK_TIMEOUT to some appropriate value. So that at least your queries don't get stuck indefinitely in deadlock situations.

6. Explain how you would implement a custom windowing function if it wasn't available in your SQL dialect.

If a built-in windowing function isn't available, I would simulate it using subqueries or self-joins. For a rolling average, for example, a subquery could calculate the average of the previous 'n' rows for each row. The outer query would then select all columns, including the calculated rolling average. This can be inefficient for large datasets.

Alternatively, I could use a self-join to explicitly compare each row to a range of preceding rows based on a defined ordering criteria (like a timestamp). Again, the average can be calculated in the SELECT statement. This method also has performance drawbacks, especially with larger datasets, as it effectively performs a cartesian product on a subset of the table. Indexes would be crucial in optimizing performance in either approach. Both alternatives would require handling edge cases and null values carefully to ensure correct results.

7. How would you ensure data consistency when performing multiple updates across related tables?

To ensure data consistency across multiple updates in related tables, I would use database transactions. A transaction groups multiple SQL operations into a single logical unit of work. Either all operations within the transaction succeed, or none of them do (atomicity).

Specifically, I would begin a transaction before any updates, perform all necessary updates to the related tables, and then either commit the transaction if all updates were successful or rollback the transaction if any update failed. Rolling back the transaction would revert the database to its state before the transaction began, thereby ensuring data consistency. BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION are the key statements here. For example, using pseudocode:

BEGIN TRANSACTION;
UPDATE table1 SET column1 = value1 WHERE condition1;
UPDATE table2 SET column2 = value2 WHERE condition2;
IF (all updates successful)
  COMMIT TRANSACTION;
ELSE
  ROLLBACK TRANSACTION;
ENDIF;

8. Describe your experience with performance tuning SQL queries using execution plans.

I have extensive experience with performance tuning SQL queries using execution plans. I routinely analyze query execution plans to identify bottlenecks such as missing indexes, inefficient join algorithms (e.g., nested loops where hash joins would be more appropriate), and full table scans. I utilize database-specific tools to visualize and interpret these plans, paying close attention to the estimated cost, cardinality, and operators involved.

Based on the execution plan analysis, I apply various optimization techniques, including:

  • Index optimization: Creating new indexes, modifying existing indexes, or removing unused indexes.
  • Query rewriting: Restructuring the query to use more efficient join strategies or avoid unnecessary computations. This often involves using hints sparingly and carefully.
  • Statistics updates: Ensuring that the query optimizer has accurate statistics about the data distribution.
  • Parameter sniffing issues: I understand the importance of understanding how parameters may impact query plans, and use OPTION (RECOMPILE) when appropriate to mitigate such issues.
  • Hardware Considerations: Understand how resource constraints (memory, CPU) can impact performance and factor this in when optimizing.

9. How do you approach identifying and resolving performance bottlenecks in a database system?

I typically start by monitoring key performance indicators (KPIs) like CPU usage, memory consumption, disk I/O, and query execution times. Tools like pg_stat_statements (for PostgreSQL) or Performance Monitor (for Windows/SQL Server) are invaluable. Slow query logs are a great resource to identify the queries that are taking most time to execute.

Once a potential bottleneck is identified, I'd use tools like query explain plans (EXPLAIN ANALYZE in PostgreSQL) to understand how the database is executing the query. Common resolutions include optimizing indexes, rewriting inefficient queries, adjusting database configuration parameters (e.g., buffer pool size), or upgrading hardware if resources are truly exhausted. For example, adding an index to a frequently queried column, or rewriting a complex JOIN can significantly improve performance.

10. Explain how you would design a system to prevent SQL injection attacks.

To prevent SQL injection attacks, I would primarily focus on using parameterized queries (also known as prepared statements). These queries separate the SQL code from the user-supplied data. Instead of directly embedding user input into the SQL query string, placeholders are used. The database driver then safely handles the substitution of these placeholders with the user-provided values, ensuring that the input is treated as data, not as part of the SQL command. For example:

import sqlite3

conn = sqlite3.connect('example.db')
cursor = conn.cursor()

# Never do this -- vulnerable to SQL injection:
# username = input("Enter username: ")
# cursor.execute("SELECT * FROM users WHERE username = '%s'" % username)

# Do this instead:
username = input("Enter username: ")
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))

results = cursor.fetchall()
print(results)

conn.close()

In addition to parameterized queries, I would also implement input validation and sanitization. While parameterized queries are the first line of defense, validating and sanitizing input provides an extra layer of security by filtering out potentially malicious characters or patterns before they even reach the database. This can include limiting input length, escaping special characters, and using whitelists to only allow known good values.

11. Describe your experience with database replication and its benefits.

I have experience with database replication, primarily in read-replica setups using technologies like MySQL's replication features. I've used it to offload read traffic from the primary database to improve performance and availability. The benefits include improved read performance, increased data availability, disaster recovery capabilities and the possibility of running analytics or backups without impacting the primary database.

Specifically, I've implemented asynchronous replication where changes are propagated to read replicas with a slight delay. This allows for scaling read capacity and providing fault tolerance, as replicas can take over if the primary fails. I have also used replication for migrating databases with minimal downtime.

12. How do you monitor database performance and identify potential issues proactively?

I monitor database performance proactively using a combination of tools and techniques. I set up monitoring dashboards using tools like Prometheus, Grafana, or cloud-specific solutions (e.g., CloudWatch for AWS, Azure Monitor). These dashboards track key metrics such as CPU utilization, memory consumption, disk I/O, query execution time, and connection counts. I also configure alerts based on thresholds for these metrics to notify me of potential problems before they impact users.

To identify potential issues proactively, I regularly review query performance using tools like EXPLAIN PLAN (in SQL databases). I also use database-specific performance analysis tools (e.g., Performance Insights in RDS) to identify slow queries and inefficient indexes. I regularly run database health checks to ensure best practices are followed, and I analyze logs for errors or warnings that could indicate underlying problems.

13. Explain how you would optimize a database for read-heavy vs. write-heavy workloads.

For read-heavy workloads, focus on indexing frequently queried columns, using caching mechanisms (like Redis or Memcached) to store frequently accessed data, and potentially denormalizing the database to reduce the number of joins required for queries. Read replicas can also be employed to distribute the read load across multiple servers.

In contrast, for write-heavy workloads, prioritize efficient data insertion and updates. Consider using techniques like batch processing for writes, optimizing indexes to minimize write overhead (too many indexes can slow down writes), and using message queues (like Kafka or RabbitMQ) to decouple the application from the database. Also, consider using write-through or write-back caching strategies depending on data consistency requirements. Partitioning or sharding can distribute the write load across multiple database instances.

14. Describe your experience with different types of database backups and recovery strategies.

I have experience with several database backup types, including full, differential, and incremental backups. Full backups copy the entire database, providing the most comprehensive recovery point but taking the longest time and requiring the most storage. Differential backups save all changes since the last full backup, offering a faster backup time than full backups but requiring both the last full and the latest differential backup for recovery. Incremental backups store only the changes made since the last backup (full or incremental), resulting in the fastest backup times but the most complex recovery process, as all incremental backups since the last full backup are needed.

My recovery strategies involve using these backups in different scenarios. For complete data loss, I would restore the last full backup followed by the most recent differential backup (if available) or all incremental backups in sequence since the last full backup. For point-in-time recovery, I would restore the necessary full, differential and incremental backups up to the desired point in time, taking into account transaction logs if available. I've also worked with backup validation procedures to ensure the integrity and recoverability of backups, as well as testing disaster recovery plans regularly.

15. How would you implement a system to track changes to data over time (e.g., auditing or versioning)?

One approach is to use database triggers to capture changes. When data is inserted, updated, or deleted, a trigger can fire and record the old and new values in a separate audit table. This audit table would include columns for the table name, column name, operation type (insert, update, delete), timestamp, user, and old/new values.

Another option is to implement versioning directly in the application layer. Before updating data, the application can retrieve the existing record, serialize it, and store it in a versioning table along with a timestamp and user ID. This approach provides more control over what data is versioned and how it's stored, but requires more code. For instance, if you are using relational databases, a new table can be created such as table_name_history to store older versions.

16. Explain how you would handle large data sets (e.g., millions or billions of rows) in SQL queries.

When dealing with large datasets in SQL, performance is key. I'd focus on several strategies. First, ensure proper indexing on columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses; this dramatically speeds up data retrieval. Secondly, optimize query structure by avoiding SELECT * (specify needed columns only), using WHERE clauses to filter data early in the process, and rewriting complex subqueries or correlated subqueries into joins or temporary tables. Also, consider using LIMIT to sample the data if full processing is not required, and explore partitioning the table if the database supports it.

Further, I would monitor query performance using tools like EXPLAIN to identify bottlenecks (e.g., full table scans). For aggregation, consider using summary tables or materialized views to pre-compute aggregations. Finally, for extremely large datasets exceeding database capacity, consider using distributed data processing frameworks like Hadoop/Spark in conjunction with tools like Hive or Impala, or cloud-based data warehousing solutions like Snowflake or BigQuery. When choosing between these options, I will consider the cost, complexity, and required query performance.

17. Describe your experience with using Common Table Expressions (CTEs) for complex queries.

I have used Common Table Expressions (CTEs) extensively to simplify complex SQL queries and improve readability. CTEs are particularly useful when dealing with hierarchical data, recursive queries, or when a subquery is used multiple times within a larger query.

For example, I've used CTEs to:

  • Break down a complex query into smaller, more manageable logical blocks. This makes the overall query easier to understand and debug.
  • Implement recursive queries, such as traversing an organizational chart or a bill-of-materials structure.
  • Avoid code duplication by defining a CTE once and referencing it multiple times within the main query. This improves maintainability. Here's an example of a CTE used for calculating cumulative sales:
WITH CumulativeSales AS (
    SELECT
        sale_date,
        sales_amount,
        SUM(sales_amount) OVER (ORDER BY sale_date) AS cumulative_amount
    FROM
        sales_table
)
SELECT
    sale_date,
    sales_amount,
    cumulative_amount
FROM
    CumulativeSales;

18. How would you implement a full-text search functionality in your database?

To implement full-text search, I would use a dedicated search index alongside my database. Common choices include Elasticsearch or Solr. Data would be synchronized between the database and the search index, often using asynchronous processes or database triggers. When a user searches, the query would be directed to the search index, which is optimized for text-based searches and uses techniques like stemming, tokenization, and inverted indexes to provide fast and relevant results.

Alternatively, if external tools are not an option, most modern databases offer built-in full-text search capabilities (e.g., PostgreSQL's tsvector and tsquery). While not as performant as dedicated search engines for massive datasets, they are sufficient for many use cases. I would create a tsvector column to store the indexed text and use tsquery to perform searches. Appropriate indexes should be added to tsvector column to speed up the query performance.

19. Explain how you would design a data warehouse schema for business intelligence reporting.

I would design a data warehouse schema using a star schema or snowflake schema, depending on the complexity and performance requirements. The star schema is simpler, with a central fact table containing measures (e.g., sales amount) and foreign keys to dimension tables (e.g., customer, product, date). Snowflake schema normalizes the dimension tables further, reducing redundancy but potentially impacting query performance due to more joins. I would identify key business metrics and the dimensions needed to analyze them and choose the schema that best suits the organization's specific needs.

Data modeling techniques like slowly changing dimensions (SCDs) are crucial to handle historical data changes in dimension tables. Appropriate indexing and partitioning strategies can be implemented to optimize query performance for BI reporting tools. ETL (Extract, Transform, Load) processes would be designed to populate the data warehouse from various source systems.

20. Describe your experience with database security best practices.

My experience with database security best practices includes implementing several key measures to protect sensitive data. I've worked with various techniques such as: regularly updating database systems and applying security patches; implementing strong authentication and authorization mechanisms using roles and permissions; encrypting sensitive data both at rest and in transit; using parameterized queries (or ORM equivalents) to prevent SQL injection attacks; and regularly backing up databases and storing backups securely.

Furthermore, I've performed security audits to identify vulnerabilities and misconfigurations, followed the principle of least privilege, and monitored database activity logs for suspicious behavior. I also have experience with tools such as nmap, sqlmap, and various static code analysis tools to ensure database security and data integrity. I am familiar with implementing firewalls and network segmentation to restrict access to the database server.

21. How do you approach testing SQL code to ensure its correctness and performance?

Testing SQL code involves several layers. Unit tests focus on individual components like stored procedures or functions. These use frameworks like tSQLt or custom scripts to verify the correct output for given inputs, covering edge cases and boundary conditions. We can use ASSERT statements to check results. Performance testing includes measuring query execution time using tools like SQL Profiler or extended events, identifying slow queries, and optimizing them with indexing or query rewriting. Data validation tests verify data integrity and consistency across the database, using data comparison tools or custom scripts. Integration tests are important and simulate real-world scenarios that use different SQL queries together to form the whole application. Finally, it is important to test in a non-production environment.

22. Explain the trade-offs between different isolation levels in SQL transactions.

SQL transaction isolation levels control the degree to which concurrent transactions are isolated from each other. Higher isolation levels provide greater data consistency but can reduce concurrency. Key isolation levels include: Read Uncommitted (allows 'dirty reads', highest concurrency, lowest consistency), Read Committed (prevents dirty reads, reads only committed data), Repeatable Read (prevents dirty reads and non-repeatable reads, but may have phantom reads), and Serializable (highest consistency, prevents dirty, non-repeatable, and phantom reads, but lowest concurrency).

The trade-off lies in balancing consistency and concurrency. Serializable offers the strongest guarantees, preventing anomalies but potentially causing significant performance bottlenecks due to locking. Read Uncommitted offers the best concurrency but exposes the application to inconsistencies from uncommitted data. Choosing the appropriate isolation level depends on the specific application's requirements for data integrity and performance. Most applications use Read Committed or Repeatable Read as a middle ground.

23. Describe a time you had to debug a complex stored procedure. What tools and techniques did you use?

During a performance bottleneck investigation, I had to debug a complex stored procedure responsible for generating monthly financial reports. The procedure involved multiple joins across several large tables, complex calculations, and temporary tables. The initial symptom was slow report generation, impacting users across the finance department. To tackle this, I started by using the SQL Server Profiler to capture the execution flow and identify the slowest-running queries within the stored procedure. I focused on queries with high duration and CPU usage. I then used the SET STATISTICS TIME ON and SET STATISTICS IO ON commands to analyze query performance further, pinpointing specific tables and indexes causing bottlenecks. I also used sp_who2 to identify any blocking issues.

To optimize, I reviewed the query execution plans using SQL Server Management Studio, looking for missing indexes, table scans, and inefficient join orders. Based on the execution plans, I added missing indexes, rewrote some queries to improve join performance (e.g., using INNER JOIN instead of OUTER JOIN where appropriate, or rewriting subqueries as joins), and optimized the usage of temporary tables. I also used the SQL Server's query hints feature judiciously, after thoroughly testing their impact. After each change, I re-ran the profiler and statistics commands to measure the improvement. This iterative process, combining profiling, execution plan analysis, and targeted optimizations, significantly reduced the stored procedure's execution time.

24. How would you design a system for handling user authentication and authorization in a database application?

User authentication and authorization in a database application typically involves several key components. Authentication verifies the user's identity, often using usernames and passwords (hashed and salted, stored securely) or multi-factor authentication. Authorization determines what resources and actions a user is permitted to access, enforced through role-based access control (RBAC) or attribute-based access control (ABAC).

I would design a system that separates authentication from authorization. Authentication might involve a dedicated authentication service using protocols like OAuth 2.0 or JWT. Upon successful authentication, the user receives a token. Subsequent requests to the database include this token. The authorization layer (possibly within the application server or database itself) uses the token to determine the user's roles or attributes and enforce access policies, such as only allowing 'admin' role users to execute CREATE TABLE statements or restricting access to specific data based on user group membership. The database user should have only the bare minimum permissions required to work (least privilege principle).

25. Explain how you would use partitioning to improve query performance on a large table.

Partitioning divides a large table into smaller, more manageable pieces based on a partition key. This improves query performance because the database can scan only the relevant partitions instead of the entire table. For example, if you partition a table by date, queries that filter by date will only scan the partitions containing the matching dates.

There are several types of partitioning: range partitioning (based on ranges of values), list partitioning (based on specific values), and hash partitioning (based on a hash function). Consider a table sales with columns sale_date and region. Partitioning sales by sale_date using monthly partitions allows queries like SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31' to only scan the January 2023 partition, vastly improving speed.

26. Describe your experience with using database triggers and their potential impact on performance.

I have experience using database triggers in SQL Server and MySQL to automate tasks like auditing data changes, enforcing complex business rules, and maintaining data integrity. For example, I implemented a trigger to automatically log updates to a sensitive table, capturing the old and new values for auditing purposes. Another use case was to automatically update a derived column whenever the base columns were modified.

However, I am also aware of the potential performance impact of triggers. Triggers execute automatically as part of DML operations (INSERT, UPDATE, DELETE), and poorly designed triggers can significantly slow down these operations. It's important to keep triggers as concise and efficient as possible. Things I've done to mitigate performance issues include using set-based operations within the trigger instead of row-by-row processing, avoiding complex logic, and ensuring proper indexing to support the trigger's queries. I also carefully consider whether a trigger is truly the best solution, weighing it against alternatives like application-level logic or scheduled batch processes. In cases where triggers become overly complex or performance-critical, I explore using stored procedures called asynchronously to minimize the impact on the main transaction.

SQL Coding MCQ

Question 1.

Given an employees table with columns employee_id, employee_name, salary, and department_id, which SQL query retrieves all employees whose salary is greater than the average salary of their respective department?

Options:
Question 2.

Which SQL query correctly identifies the employee with the highest salary in each department, using window functions?

Options:
Question 3.

You need to find all departments that have more than 5 employees. Which SQL statement would accomplish this, assuming you have a table named employees with columns employee_id and department_id?

Options:

Options:
Question 4.

Given an Employees table with columns id, name, and salary, which SQL query retrieves the second highest salary from the table?

Options:
Question 5.

Given an Employees table with columns employee_id, first_name, last_name, and manager_id, and another table Managers with columns manager_id, first_name, and last_name, which SQL query retrieves the first and last names of all employees who report to a manager whose last name starts with 'S'?

options:

Options:
Question 6.

Which SQL query retrieves all employees whose salary is equal to the lowest salary in the company?

Options:
Question 7.

Which SQL query retrieves the departments where the average salary is greater than the overall average salary of all employees in the company?

options:

Options:
Question 8.

Given an Employees table with columns emp_id, emp_name, department, and salary, which SQL query retrieves the names of employees who have the highest salary within their respective departments using a subquery?

Options:
Question 9.

You need to retrieve a list of employees who joined the company within the last three months. Assuming you have an 'Employees' table with a 'hire_date' column of type DATE, which SQL query would achieve this?

Options:
Question 10.

Which SQL query retrieves all employees whose salary is between $60,000 and $80,000 (inclusive)?

Options:

Options:
Question 11.

Which SQL query retrieves all employees whose salary is greater than the salary of at least one employee in department 10? Options:

Options:
Question 12.

Which SQL query will return the number of employees in each department, ordered by the number of employees in descending order?

Options:

Options:
Question 13.

Which SQL query retrieves employees who have the lowest salary in their respective departments?

Options:

Options:
Question 14.

Which SQL query retrieves all employees whose first name contains both the letters 'a' and 'e', regardless of case?

options:

Options:
Question 15.

Which SQL query can be used to find the department with the highest total salary expenditure (sum of salaries) across all employees in that department?

Options:

Options:
Question 16.

Consider two tables, Employees (columns: employee_id, employee_name, department_id) and Departments (columns: department_id, department_name). Which SQL query will return a list of all employee names and their corresponding department names?

Options:
Question 17.

Which SQL query retrieves the top 3 highest paid employees in each department, ordered by salary in descending order?

Options:

Options:
Question 18.

Which SQL query retrieves employees whose salary is greater than the salary of all employees in department 50?

Options:
Question 19.

Consider an Employees table with columns employee_id, first_name, last_name, salary, and manager_id (referencing employee_id). Which SQL query retrieves employees who are managers (i.e., have subordinates) and also have at least one subordinate earning more than $80,000?

Options:

Options:
Question 20.

Which SQL query retrieves all employees who earn more than any employee in the 'Sales' department?

options:

Options:
Question 21.

Which of the following SQL queries retrieves employees who have a salary greater than the average salary of their department using a correlated subquery?

Options:

Options:
Question 22.

Which SQL query retrieves employees with a salary of $60,000 or more who have been employed for longer than 5 years (assuming the employees table has columns salary and hire_date)?

options:

Options:
Question 23.

Which SQL query retrieves employees who have the same job title as any employee in department 20, using the EXISTS operator?

Options:
Question 24.

Which SQL query retrieves the departments where the average salary is greater than $70,000 using a subquery in the WHERE clause?

Options:

Options:
Question 25.

Which SQL query retrieves all departments that have no employees assigned to them, using a subquery?

options:

Options:

Which SQL Coding skills should you evaluate during the interview phase?

It's impossible to assess every aspect of a candidate's SQL abilities in a single interview. However, focusing on core skills will help you identify strong candidates. These skills are important for SQL coding and will help you evaluate candidates effectively.

Which SQL Coding skills should you evaluate during the interview phase?

Querying

To quickly gauge a candidate's querying abilities, use an assessment test with relevant MCQs. Our SQL Coding test assesses candidates on their knowledge of querying skills.

You can also directly assess a candidate's querying skills with targeted interview questions. This allows you to see their thought process and problem-solving approach.

Write a query to retrieve all customer names and order dates from the 'Customers' and 'Orders' tables, respectively, where the customer ID matches in both tables.

Look for the candidate's understanding of JOIN operations and how to correctly link tables based on a common key. A candidate should also demonstrate the ability to select the specific columns requested.

Data Manipulation

You can use MCQs to filter candidates on Data Manipulation. Our library has something similar.

Try asking targeted interview questions to judge the candidate's data manipulation skills. This allows you to gauge their ability to write the correct SQL to modify data.

Write an SQL query to update the 'City' field of the 'Customers' table to 'New York' for all customers with a CustomerID between 10 and 20.

Watch for the correct use of the UPDATE statement and the WHERE clause. The candidate should understand how to apply conditions to update specific records.

Database Design

Assessing database design knowledge can be done through MCQs that test understanding of normalization concepts. You can use a test that has some overlapping features for these candidates.

Ask questions focusing on database design to check if the candidate gets the underlying basics.

Describe the importance of normalization in database design and provide examples of different normalization forms.

Look for an understanding of the benefits of normalization, such as reducing redundancy and improving data integrity. The candidate should be able to provide concrete examples of normalization forms (e.g., 1NF, 2NF, 3NF).

3 Tips for Maximizing Your SQL Coding Interview Questions

Before you start putting what you've learned into practice, here are three tips to help you conduct more effective SQL coding interviews. These suggestions will help you refine your approach and make better hiring decisions.

1. Leverage SQL Skills Tests to Streamline Candidate Screening

Skills tests offer a data-driven approach to assess SQL proficiency early in the hiring process. This helps in filtering candidates based on demonstrable skills, saving valuable interview time.

Consider using Adaface's SQL Online Test or specific tests like MySQL Online Test or PostgreSQL Test. These tests allow you to verify skills like writing SQL queries and database design knowledge.

By using skills tests, you ensure that only candidates with the necessary SQL coding abilities progress to the interview stage. This process improves the quality of your candidate pool and allows you to focus on deeper assessments during the interview.

2. Curate Relevant Interview Questions Focused on Key Areas

Interviews are a limited resource, so make every question count. Carefully select SQL coding interview questions that target the most important aspects of the role, such as query optimization or data manipulation.

To further refine your interview, consider incorporating questions that assess related skills such as data modeling. Exploring these adjacent areas provides a more well-rounded view of the candidate's capabilities.

By focusing on the most important skills and related areas, you can maximize the value of your interview time and gain a deeper understanding of each candidate's potential.

3. Ask Follow-Up Questions to Evaluate Depth of Knowledge

Relying solely on initial answers might not reveal the full extent of a candidate's SQL coding expertise. Asking thoughtful follow-up questions helps you uncover the depth of their understanding and identify potential gaps.

For example, after asking a candidate to write a SQL query, follow up with "How would you optimize this query for performance with large datasets?" This reveals their understanding of query optimization strategies and their ability to handle real-world scenarios.

Hire Top SQL Talent with Skills Tests and Targeted Interview Questions

If you're aiming to bring on board individuals with strong SQL skills, verifying their proficiency is key. Utilizing dedicated SQL skills tests is the most accurate way to assess a candidate's abilities. Explore Adaface's range of SQL tests, including SQL Online Test and SQL Coding Test, to identify top performers.

Once you've identified top candidates through skills testing, you can confidently move forward with interviews. Shortlist the best applicants and conduct targeted interviews to evaluate their problem-solving abilities and communication skills. Get started today by signing up or discover more about our Coding Tests.

SQL Online Test

25 mins | 10 MCQs
The SQL online test evaluates a candidate's ability to design and build relational databases and tables from scratch, apply CRUD options, write efficient queries and subqueries to filter data and create efficient indexes for faster SQL queries.
Try SQL Online Test

Download SQL Coding interview questions template in multiple formats

SQL Coding Interview Questions FAQs

What types of SQL interview questions should I ask freshers?

Focus on basic SQL syntax, understanding of different data types, and simple queries like SELECT, INSERT, UPDATE, and DELETE.

What are some challenging SQL questions for experienced candidates?

Consider questions that involve complex joins, subqueries, window functions, performance optimization, and schema design.

How can I use SQL interview questions to assess problem-solving skills?

Present scenarios that require candidates to analyze data, identify patterns, and write SQL queries to extract meaningful insights.

What's the best way to prepare for conducting SQL coding interviews?

Familiarize yourself with common SQL concepts, data structures, and query optimization techniques. Practice writing and reviewing SQL code.

Besides coding, what other SQL-related skills should I evaluate?

Consider assessing candidates' understanding of database design principles, data modeling, and SQL server administration tasks.

How do I effectively use SQL interview questions during remote interviews?

Share coding environments and allow candidates to execute queries in real-time. Focus on collaborative problem-solving and communication of thought processes.

Related posts

Free resources

customers across world
Join 1200+ companies in 80+ countries.
Try the most candidate friendly skills assessment tool today.
g2 badges
logo
40 min tests.
No trick questions.
Accurate shortlisting.