Analyzing Pharmaceutical Equipment Maintenance
Beginner Mode

Start your terminal to use beginner mode.

Objective

A pharmaceutical company tracks its equipment data in two different DataFrames. The first, df1, contains metadata about the equipment (like purchase dates). The second, df2, acts as a ledger keeping track of the maintenance history and costs for each piece of equipment.

Task

Write a PySpark function that extracts the most recent maintenance record for each piece of equipment and evaluates its cost relative to all past maintenance costs for that same equipment.

  1. Filter df2 to only keep the latest maintenance date per equipment.
  2. Calculate the maintenance_cost_rank using a dense rank (where rank 1 represents the highest maintenance cost for that specific piece of equipment across its entire history).
  3. Join the filtered records with df1. Ensure the output only contains equipment that has at least one maintenance record.

Save your resulting DataFrame as result_df. Ensure the output matches the exact schema order requested, and order the final output by equipment_id in ascending order.

File Path

  • Equipment Dataset: /home/interview/df1.csv
  • Maintenance Dataset: /home/interview/df2.csv
  • Starter script: /home/interview/pharmaceutical_maintenance.py

Schema

df1.csv

Column Name Data Type
equipment_id string
equipment_name string
purchase_date date

df2.csv

Column Name Data Type
equipment_id string
maintenance_date date
maintenance_cost double

Expected Output Schema

Column Name Data Type
equipment_id string
equipment_name string
purchase_date date
latest_maintenance_date date
maintenance_cost_rank integer

Example

Given this sample input:

df1

equipment_id equipment_name purchase_date
EQ001 Mixer 2020-01-01
EQ002 Centrifuge 2020-02-01
EQ003 Pipette 2020-03-01

df2

equipment_id maintenance_date maintenance_cost
EQ001 2021-06-01 500.0
EQ002 2021-07-01 400.0
EQ001 2021-07-02 600.0

The expected output would be:

equipment_id equipment_name purchase_date latest_maintenance_date maintenance_cost_rank
EQ001 Mixer 2020-01-01 2021-07-02 1
EQ002 Centrifuge 2020-02-01 2021-07-01 1

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →