Calculating Annual GDP Growth Rates
Beginner Mode

Start your terminal to use beginner mode.

Objective

As an Economist, you need to compute the annual growth rate of GDP from multiple economic DataFrames. The GDP growth rate is the percentage increase in a country's GDP from one year to the next. It is calculated by using the formula:

GDP growth rate = [(GDP this year - GDP last year) / GDP last year] * 100

You have been provided with two DataFrames containing economic data for different countries and different years.

Task

Write a PySpark function that combines these DataFrames and returns the annual GDP growth rate for each country and each year.

Constraints:

  • The output should be sorted in ascending order first by country name and then by year.
  • The GDP growth rate should be rounded off to two decimal places.
  • If the GDP data for the exact previous year is not available (e.g. a gap in the records), the GDP growth rate for the current year should be null.
  • You can assume that the data in both the input DataFrames is clean (no missing values, GDP >= 0).

Save your resulting DataFrame as result_df. Ensure the output exactly matches the requested Output Schema.

File Path

  • Dataset 1: /home/interview/df1.csv
  • Dataset 2: /home/interview/df2.csv
  • Starter script: /home/interview/gdp_growth.py

Schema

df1.csv & df2.csv

Column Name Data Type
Country String
Year Integer
GDP Double

Expected Output Schema

Column Name Data Type
Country String
Year Integer
GDP_growth_rate Double

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →