Filtering Top Herpetology Observations
Stripe ☯️ Medium JoinsSpark
Beginner Mode

Start your terminal to use beginner mode.

Objective

As a herpetologist studying reptiles and amphibians, you have two DataFrames at your disposal: observations (containing sighting logs) and species (containing the reference catalog of animals).

Task

Write a PySpark function that joins the observations and species DataFrames on the species_id column.

After joining, return the top 3 rows ordered by the count of individuals observed in descending order. Save your resulting DataFrame as result_df. Ensure the output matches the exact schema order requested.

File Path

  • Observations Dataset: /home/interview/observations.csv
  • Species Dataset: /home/interview/species.csv
  • Starter script: /home/interview/herpetology.py

Schema

observations.csv

Column Name Data Type Description
obs_id Integer The unique identifier of the observation
species_id Integer The unique identifier of the species observed
location_id Integer The unique identifier of the location where the observation was made
count Integer The number of individuals observed

species.csv

Column Name Data Type Description
species_id Integer The unique identifier of the species
species_name String The common name of the species

Expected Output Schema

Column Name Data Type Description
obs_id Integer The unique identifier of the observation
species_id Integer The unique identifier of the species
species_name String The common name of the species
location_id Integer The unique identifier of the location where the observation was made
count Integer The number of individuals observed

Example

Given this sample input:

observations

obs_id species_id location_id count
1 100 1 55
2 101 2 35
3 100 1 45

species

species_id species_name
100 Python
101 Gecko
102 Frog

The expected output would be:

obs_id species_id species_name location_id count
1 100 Python 1 55
3 100 Python 1 45
2 101 Gecko 2 35

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →