Latest Real Interview Questions & Interactive Labs

# Data Analysis @ Uber

Customer Churn Prediction Using SQL and Python Data Analyst @ Uber Difficulty Medium

Here is sample data for the transactions table:

| customer_id | month      | amount_spent |
|-------------|------------|--------------|
| 1           | 2022-10-01 | 100.50       |
| 1           | 2022-11-01 | 150.75       |
| 1           | 2022-12-01 | 120.00       |
| 2           | 2022-10-01 | 200.00       |
| 2           | 2022-11-01 | 180.25       |
| 2           | 2022-12-01 | 210.50       |
| 3           | 2022-10-01 | 50.25        |
| 3           | 2022-11-01 | 75.00        |
| 3           | 2022-12-01 | 90.00        |

Additionally, assume the dataset also provides churn information with the following columns: churn_info table:

| customer_id | has_churned |
|-------------|-------------|
| 1           | 0           |
| 2           | 1           |
| 3           | 0           |

Write a SQL query to:

Calculate the average monthly spend per customer for the past year.

Write a Python script that uses the dataset to: 2. Calculate the probability of customer churn using logistic regression. Assume you have columns for customer_id, month, amount_spent, and a target column has_churned.

Solution:

Please sign-in to view the solution

Data Analysis on Provided Datasets Data Analyst @ Uber Difficulty Hard

Write a SQL query to find the top 5 customers based on total expenditure over the last year.

Additionally, create a Python script to plot a bar chart showing total monthly orders over the same period.

Sample Input Data

orders Table:

| order_id | customer_id | order_date | total_amount |
|----------|-------------|------------|--------------|
| 1        | 1           | 2022-11-20 | 150.75       |
| 2        | 2           | 2022-12-05 | 200.50       |
| 3        | 1           | 2022-12-20 | 75.00        |
| 4        | 3           | 2023-01-05 | 300.00       |
| 5        | 2           | 2023-01-15 | 180.75       |
| 6        | 1           | 2023-02-10 | 120.00       |
| 7        | 4           | 2023-02-20 | 250.00       |
| 8        | 3           | 2023-03-01 | 90.00        |

customers Table:

| customer_id | customer_name | signup_date |
|-------------|---------------|-------------|
| 1           | Alice         | 2021-05-01  |
| 2           | Bob           | 2022-03-15  |
| 3           | Carol         | 2022-07-22  |
| 4           | Dave          | 2022-09-19  |

Solution:

Please sign-in to view the solution

Sales Performance Analysis Data Analyst @ Uber Difficulty Medium

Write a SQL query to find the top 3 products based on total sales value.

Additionally, create a Python script to plot a line graph showing daily total sales over the past month.

Lastly, explain how you would use Excel to calculate and visualize the sales trend.

Input:

transactions Table:

| transaction_id | date       | product_id | quantity | price  |
|----------------|------------|------------|----------|--------|
| 1              | 2023-09-01 | 101        | 2        | 10.00  |
| 2              | 2023-09-01 | 102        | 1        | 20.00  |
| 3              | 2023-09-02 | 101        | 1        | 10.00  |
| 4              | 2023-09-02 | 103        | 3        | 15.00  |
| 5              | 2023-09-03 | 102        | 2        | 20.00  |
| 6              | 2023-09-03 | 101        | 2        | 10.00  |

Solution:

Please sign-in to view the solution

Using Python to Calculate Cart Abandonment Rates Data Analyst @ Uber Difficulty Medium

Given a dataset that contains the following columns: session_id, user_id, event_type, and timestamp. The event_type column can have values such as view, add_to_cart, and purchase. Write a Python script to calculate the cart abandonment rate, defined as the percentage of sessions where items were added to the cart, but a purchase was not completed.

Input: session_data.csv looks like this:

| session_id | user_id | event_type   | timestamp           |
|------------|---------|--------------|---------------------|
| 1          | 101     | view         | 2023-09-01 10:00:00 |
| 1          | 101     | add_to_cart  | 2023-09-01 10:05:00 |
| 2          | 101     | view         | 2023-09-01 11:00:00 |
| 2          | 101     | purchase     | 2023-09-01 11:15:00 |
| 2          | 101     | add_to_cart  | 2023-09-01 11:10:00 |
| 3          | 102     | view         | 2023-09-02 09:00:00 |
| 3          | 102     | add_to_cart  | 2023-09-02 09:10:00 |
| 4          | 103     | view         | 2023-09-02 10:00:00 |
| 4          | 103     | purchase     | 2023-09-02 10:15:00 |

Solution:

Please sign-in to view the solution

Calculate Probability Using SQL and Visualize with Excel Data Analyst @ Uber Difficulty Medium

Write an SQL query to calculate the probability that a user who logged in also made a purchase. Then, demonstrate how you would visualize this result using Excel.

Input:

user_events Table:

| user_id | event_type | event_date  |
|---------|------------|-------------|
| 1       | login      | 2023-01-01  |
| 1       | click      | 2023-01-01  |
| 1       | purchase   | 2023-01-01  |
| 2       | login      | 2023-01-02  |
| 2       | click      | 2023-01-02  |
| 3       | login      | 2023-01-03  |
| 3       | click      | 2023-01-03  |
| 4       | click      | 2023-01-04  |
| 4       | purchase   | 2023-01-04  |
| 5       | login      | 2023-01-05  |

Solution:

Please sign-in to view the solution

Calculate Customer Rank and Categorize Spending Using SQL and Python Data Analyst @ Uber Difficulty Medium

Write an SQL query to assign a rank to each customer based on their total spending using both RANK() and DENSE_RANK() functions.

Then, using Python, categorize each customer's total spending into High, Medium, and Low using pandas and numpy, and visualize the distribution.

Input:

sales_data table:

| customer_id | total_spent | purchase_date |
|-------------|-------------|---------------|
| 1           | 100.00      | 2023-01-01    |
| 2           | 200.00      | 2023-01-02    |
| 3           | 150.00      | 2023-01-03    |
| 4           | 300.00      | 2023-01-04    |
| 5           | 200.00      | 2023-01-05    |

Solution:

Please sign-in to view the solution

Using Window Functions (RANK, DENSE_RANK, PARTITION BY) in SQL Data Analyst @ Uber Difficulty Medium

Write an SQL query to assign a RANK and DENSE_RANK to each employee within their respective department based on their total sales. Additionally, demonstrate the use of PARTITION BY to calculate the total sales for each department.

employee_sales table:

| employee_id | employee_name | department | sales_amount | sales_date |
|-------------|---------------|------------|--------------|------------|
| 1           | Alice         | Sales      | 500.00       | 2023-01-01 |
| 2           | Bob           | Sales      | 300.00       | 2023-01-02 |
| 3           | Carol         | Sales      | 700.00       | 2023-01-03 |
| 4           | Dave          | Marketing  | 400.00       | 2023-01-04 |
| 5           | Eve           | Marketing  | 450.00       | 2023-01-05 |
| 6           | Frank         | Sales      | 300.00       | 2023-01-06 |

Solution:

Please sign-in to view the solution

SQL Joins and Ranking in Employee and Department Data Data Analyst @ Uber Difficulty Medium

Write a SQL query to retrieve the list of employees along with their department names, and assign a rank to each employee based on their salary within their department.

employees Table:

| employee_id | employee_name | department_id | salary |
|-------------|---------------|---------------|--------|
| 1           | Alice         | 10            | 60000  |
| 2           | Bob           | 10            | 50000  |
| 3           | Carol         | 20            | 70000  |
| 4           | Dave          | 20            | 85000  |
| 5           | Eve           | 30            | 90000  |

departments Table:

| department_id | department_name |
|---------------|-----------------|
| 10            | Sales           |
| 20            | Marketing       |
| 30            | Finance         |

Solution:

Please sign-in to view the solution

Analyzing Performance Based on Two Datasets in a Panel Interview Data Analyst @ Uber Difficulty Medium

You are given two datasets: employee_details with columns employee_id, employee_name, hire_date, and department_id; and sales_performance with columns employee_id, sales_amount, and sales_date. During a panel interview, you are asked to analyze the performance of employees by joining these datasets and calculating the total sales for each department, as well as identifying top and bottom performers in each department.

Solution:

Please sign-in to view the solution

Solving Algebra and Speed vs Distance Problems Data Analyst @ Uber Difficulty Easy

In a panel interview, you're given two mathematical problems. The first involves solving an algebra equation, and the second involves calculating speed, distance, or time given two of these variables. Provide solutions for the following problems:

Solve the algebraic equation: 3x + 5 = 2x - 7
If a car travels 150 kilometers in 3 hours, what is its average speed in kilometers per hour?

Solution:

Please sign-in to view the solution

Finding Drivers with Earnings in Consecutive Months Data Analyst @ Uber Difficulty Medium

You have a dataset driver_trips with columns trip_date, fare, driver_id, and status. Write an SQL query to find details of drivers who have had earnings in consecutive months. Only consider trips with status as 'completed'.

Solution:

Please sign-in to view the solution

Calculating Mean, Median, and Standard Deviation Business Analyst @ Uber Difficulty Easy

You are given a dataset sales_data with a column sales_amount representing the sales figures for different transactions. Write Python code to calculate the mean, median, and standard deviation of the sales amounts.

Solution:

Please sign-in to view the solution

Merging, Sorting, and Ranking Tables in SQL and Python List Comprehensions Business Analyst @ Uber Difficulty Medium

Write an SQL query to merge these tables, sort the result by department_name and employee_name, and rank employees within their departments.

Additionally, write a small Python program using list comprehensions to filter and sort a list of dictionaries representing employees.

employees Table:

| employee_id | employee_name | department_id |
|-------------|---------------|---------------|
| 1           | Alice         | 10            |
| 2           | Bob           | 10            |
| 3           | Carol         | 20            |
| 4           | Dave          | 20            |
| 5           | Eve           | 30            |

departments Table:

| department_id | department_name |
|---------------|-----------------|
| 10            | Sales           |
| 20            | Marketing       |
| 30            | Finance         |

Solution:

Please sign-in to view the solution

SQL Queries Involving WHERE vs HAVING, NULL Values Filtering, Order of Execution, and CASE WHEN Statements Business Analyst @ Uber Difficulty Medium

You are given a dataset orders with columns order_id, product_name, quantity, price, and order_date. Provide SQL solutions to illustrate the following:

The difference between WHERE and HAVING.
How to filter out NULL values in the price column.
The correct order of execution for a complex SQL query.
Using CASE WHEN statements to categorize orders based on quantity.

Solution:

Please sign-in to view the solution

Solving the 2-Egg Problem Business Analyst @ Uber Difficulty Hard

You are given 2 eggs and a building with n floors. The goal is to determine the highest floor from which an egg can be dropped without breaking. Describe a strategy to minimize the number of drops required in the worst-case scenario and write a Python program to implement this strategy for a building with a given number of floors n.

Solution:

Please sign-in to view the solution

Calculating Weighted Average of Requests per Driver Based on a CSV Dataset Business Analyst @ Uber Difficulty Medium

You have a CSV file driver_requests.csv with columns driver_id, num_requests, and hours_driven. Write a Python program to calculate the weighted average of requests per driver, weighted by the number of hours driven.

Solution:

Please sign-in to view the solution

Writing a Nested Query to Find Top Performing Products Business Analyst @ Uber Difficulty Medium

Write an SQL nested query to find the top-performing product (the product with the highest total sales amount) and display its product_id and product_name.

sales Table:

| product_id | product_name | sales_amount | sales_date |
|------------|--------------|--------------|------------|
| 1          | Product A    | 200.00       | 2023-01-01 |
| 2          | Product B    | 500.00       | 2023-01-02 |
| 1          | Product A    | 300.00       | 2023-01-03 |
| 3          | Product C    | 400.00       | 2023-01-04 |
| 2          | Product B    | 100.00       | 2023-01-05 |

Solution:

Please sign-in to view the solution