Here is sample data for the transactions
table:
| customer_id | month | amount_spent |
|-------------|------------|--------------|
| 1 | 2022-10-01 | 100.50 |
| 1 | 2022-11-01 | 150.75 |
| 1 | 2022-12-01 | 120.00 |
| 2 | 2022-10-01 | 200.00 |
| 2 | 2022-11-01 | 180.25 |
| 2 | 2022-12-01 | 210.50 |
| 3 | 2022-10-01 | 50.25 |
| 3 | 2022-11-01 | 75.00 |
| 3 | 2022-12-01 | 90.00 |
Additionally, assume the dataset also provides churn information with the following columns:
churn_info
table:
| customer_id | has_churned |
|-------------|-------------|
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
Write a SQL query to:
- Calculate the average monthly spend per customer for the past year.
Write a Python script that uses the dataset to:
2. Calculate the probability of customer churn using logistic regression. Assume you have columns for customer_id
, month
, amount_spent
, and a target column has_churned
.
Please sign-in to view the solution
Write a SQL query to find the top 5 customers based on total expenditure over the last year.
Additionally, create a Python script to plot a bar chart showing total monthly orders over the same period.
Sample Input Data
orders
Table:
| order_id | customer_id | order_date | total_amount |
|----------|-------------|------------|--------------|
| 1 | 1 | 2022-11-20 | 150.75 |
| 2 | 2 | 2022-12-05 | 200.50 |
| 3 | 1 | 2022-12-20 | 75.00 |
| 4 | 3 | 2023-01-05 | 300.00 |
| 5 | 2 | 2023-01-15 | 180.75 |
| 6 | 1 | 2023-02-10 | 120.00 |
| 7 | 4 | 2023-02-20 | 250.00 |
| 8 | 3 | 2023-03-01 | 90.00 |
customers
Table:
| customer_id | customer_name | signup_date |
|-------------|---------------|-------------|
| 1 | Alice | 2021-05-01 |
| 2 | Bob | 2022-03-15 |
| 3 | Carol | 2022-07-22 |
| 4 | Dave | 2022-09-19 |
Please sign-in to view the solution
Write a SQL query to find the top 3 products based on total sales value.
Additionally, create a Python script to plot a line graph showing daily total sales over the past month.
Lastly, explain how you would use Excel to calculate and visualize the sales trend.
Input:
transactions
Table:
| transaction_id | date | product_id | quantity | price |
|----------------|------------|------------|----------|--------|
| 1 | 2023-09-01 | 101 | 2 | 10.00 |
| 2 | 2023-09-01 | 102 | 1 | 20.00 |
| 3 | 2023-09-02 | 101 | 1 | 10.00 |
| 4 | 2023-09-02 | 103 | 3 | 15.00 |
| 5 | 2023-09-03 | 102 | 2 | 20.00 |
| 6 | 2023-09-03 | 101 | 2 | 10.00 |
Please sign-in to view the solution
Given a dataset that contains the following columns: session_id
, user_id
, event_type
, and timestamp
. The event_type
column can have values such as view
, add_to_cart
, and purchase
. Write a Python script to calculate the cart abandonment rate, defined as the percentage of sessions where items were added to the cart, but a purchase was not completed.
Input:
session_data.csv
looks like this:
| session_id | user_id | event_type | timestamp |
|------------|---------|--------------|---------------------|
| 1 | 101 | view | 2023-09-01 10:00:00 |
| 1 | 101 | add_to_cart | 2023-09-01 10:05:00 |
| 2 | 101 | view | 2023-09-01 11:00:00 |
| 2 | 101 | purchase | 2023-09-01 11:15:00 |
| 2 | 101 | add_to_cart | 2023-09-01 11:10:00 |
| 3 | 102 | view | 2023-09-02 09:00:00 |
| 3 | 102 | add_to_cart | 2023-09-02 09:10:00 |
| 4 | 103 | view | 2023-09-02 10:00:00 |
| 4 | 103 | purchase | 2023-09-02 10:15:00 |
Please sign-in to view the solution
Write an SQL query to calculate the probability that a user who logged in also made a purchase. Then, demonstrate how you would visualize this result using Excel.
Input:
user_events
Table:
| user_id | event_type | event_date |
|---------|------------|-------------|
| 1 | login | 2023-01-01 |
| 1 | click | 2023-01-01 |
| 1 | purchase | 2023-01-01 |
| 2 | login | 2023-01-02 |
| 2 | click | 2023-01-02 |
| 3 | login | 2023-01-03 |
| 3 | click | 2023-01-03 |
| 4 | click | 2023-01-04 |
| 4 | purchase | 2023-01-04 |
| 5 | login | 2023-01-05 |
Please sign-in to view the solution
Write an SQL query to assign a rank to each customer based on their total spending using both RANK()
and DENSE_RANK()
functions.
Then, using Python, categorize each customer's total spending into High, Medium, and Low using pandas and numpy, and visualize the distribution.
Input:
sales_data
table:
| customer_id | total_spent | purchase_date |
|-------------|-------------|---------------|
| 1 | 100.00 | 2023-01-01 |
| 2 | 200.00 | 2023-01-02 |
| 3 | 150.00 | 2023-01-03 |
| 4 | 300.00 | 2023-01-04 |
| 5 | 200.00 | 2023-01-05 |
Please sign-in to view the solution
Write an SQL query to assign a RANK
and DENSE_RANK
to each employee within their respective department based on their total sales. Additionally, demonstrate the use of PARTITION BY
to calculate the total sales for each department.
employee_sales
table:
| employee_id | employee_name | department | sales_amount | sales_date |
|-------------|---------------|------------|--------------|------------|
| 1 | Alice | Sales | 500.00 | 2023-01-01 |
| 2 | Bob | Sales | 300.00 | 2023-01-02 |
| 3 | Carol | Sales | 700.00 | 2023-01-03 |
| 4 | Dave | Marketing | 400.00 | 2023-01-04 |
| 5 | Eve | Marketing | 450.00 | 2023-01-05 |
| 6 | Frank | Sales | 300.00 | 2023-01-06 |
Please sign-in to view the solution
Write a SQL query to retrieve the list of employees along with their department names, and assign a rank to each employee based on their salary within their department.
employees
Table:
| employee_id | employee_name | department_id | salary |
|-------------|---------------|---------------|--------|
| 1 | Alice | 10 | 60000 |
| 2 | Bob | 10 | 50000 |
| 3 | Carol | 20 | 70000 |
| 4 | Dave | 20 | 85000 |
| 5 | Eve | 30 | 90000 |
departments
Table:
| department_id | department_name |
|---------------|-----------------|
| 10 | Sales |
| 20 | Marketing |
| 30 | Finance |
Please sign-in to view the solution
You are given two datasets: employee_details
with columns employee_id
, employee_name
, hire_date
, and department_id
; and sales_performance
with columns employee_id
, sales_amount
, and sales_date
. During a panel interview, you are asked to analyze the performance of employees by joining these datasets and calculating the total sales for each department, as well as identifying top and bottom performers in each department.
Please sign-in to view the solution
In a panel interview, you're given two mathematical problems. The first involves solving an algebra equation, and the second involves calculating speed, distance, or time given two of these variables. Provide solutions for the following problems:
- Solve the algebraic equation: 3x + 5 = 2x - 7
- If a car travels 150 kilometers in 3 hours, what is its average speed in kilometers per hour?
Please sign-in to view the solution
You have a dataset driver_trips
with columns trip_date
, fare
, driver_id
, and status
. Write an SQL query to find details of drivers who have had earnings in consecutive months. Only consider trips with status
as 'completed'.
Please sign-in to view the solution
You are given a dataset sales_data
with a column sales_amount
representing the sales figures for different transactions. Write Python code to calculate the mean, median, and standard deviation of the sales amounts.
Please sign-in to view the solution
Write an SQL query to merge these tables, sort the result by department_name
and employee_name
, and rank employees within their departments.
Additionally, write a small Python program using list comprehensions to filter and sort a list of dictionaries representing employees.
employees
Table:
| employee_id | employee_name | department_id |
|-------------|---------------|---------------|
| 1 | Alice | 10 |
| 2 | Bob | 10 |
| 3 | Carol | 20 |
| 4 | Dave | 20 |
| 5 | Eve | 30 |
departments
Table:
| department_id | department_name |
|---------------|-----------------|
| 10 | Sales |
| 20 | Marketing |
| 30 | Finance |
Please sign-in to view the solution
You are given a dataset orders
with columns order_id
, product_name
, quantity
, price
, and order_date
. Provide SQL solutions to illustrate the following:
- The difference between
WHERE
andHAVING
. - How to filter out
NULL
values in theprice
column. - The correct order of execution for a complex SQL query.
- Using
CASE WHEN
statements to categorize orders based onquantity
.
Please sign-in to view the solution
You are given 2 eggs and a building with n floors. The goal is to determine the highest floor from which an egg can be dropped without breaking. Describe a strategy to minimize the number of drops required in the worst-case scenario and write a Python program to implement this strategy for a building with a given number of floors n.
Please sign-in to view the solution
You have a CSV file driver_requests.csv
with columns driver_id
, num_requests
, and hours_driven
. Write a Python program to calculate the weighted average of requests per driver, weighted by the number of hours driven.
Please sign-in to view the solution
Write an SQL nested query to find the top-performing product (the product with the highest total sales amount) and display its product_id
and product_name
.
sales
Table:
| product_id | product_name | sales_amount | sales_date |
|------------|--------------|--------------|------------|
| 1 | Product A | 200.00 | 2023-01-01 |
| 2 | Product B | 500.00 | 2023-01-02 |
| 1 | Product A | 300.00 | 2023-01-03 |
| 3 | Product C | 400.00 | 2023-01-04 |
| 2 | Product B | 100.00 | 2023-01-05 |
Please sign-in to view the solution