178. Select Specific Columns from Parquet File
Beginner Mode

Start your terminal to use beginner mode.

Scenario

A Parquet file contains customer data with many columns, but only a subset of columns is needed for analysis.

Task

Write a Python script at /home/interview/select_columns.py using pandas that reads /home/interview/customers.parquet, selects only the columns id, first_name, last_name, email, and total_purchases, and writes the result to /home/interview/selected_data.parquet.

Note: pandas and pyarrow are already installed.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Essential

SQL 0/33
Git 0/15
Spark 0/20
Snowflake 0/22
Python 0/24
Question Difficulty Company Access
Debug SSH Lockout Medium TCS Free
Recursive Keyword Finder Easy X Free
Docker Multi-Architecture Image Easy Accenture Free
Average Order Value Easy Accenture Free
Join Employees and Departments Easy Adobe Free
Filter Orders by Date Range Easy Google Free
Find Customers Without Orders Easy LinkedIn Free
Use COALESCE for Null Handling Easy Samsung Free
Merge Multiple Address Fields Easy Datadog Free
String Concatenation in SELECT Easy Wix Free
Find Nth Highest Revenue Easy Dropbox Free
Self-Join to Identify Missing Supervisors Easy Meta Free
Year-over-Year Revenue Growth Easy OpenAI Free
Above Average Price Products Medium Hulu Free
Calculate Cumulative Sales Medium Uber Free
Find Overlapping Date Ranges Medium X Free
Set Operation: INTERSECT Medium DoorDash Free
Subquery for Best Order per Customer Medium Anthropic Free
Ranking with Dense_Rank Medium Amazon Free
Median Salary by Job Title Medium ActivisionBlizzard Free
String Splitting and Aggregation Medium Vercel Free
Salary Comparison with CTE Aggregation Medium Crypto.Com Free
String Pattern Extraction in Descriptions Medium Zscaler Free
Nested Subquery for Latest Record Medium DoorDash Free
Window Function for Moving Average Medium DeutscheBank Free
Re-enrollment Rate Calculator Medium Google Free
String Pattern Matching Using LIKE Medium Apple Free
Merge Employee and Department Records Hard Anthropic Free
Sequence Products by Price Hard GoDaddy Free
Top Categories by Average Price Hard Samsung Free
Customer Order Aggregation Medium BMW Free
Filter Popular Videos on a Streaming Platform Easy Apple Free
Replace Keywords in Social Media Post Text Easy PayPal Free
Filter Movies with Missing Box Office Data Easy DoorDash Free
Daily Category Sales Easy Snowflake Free
Filter and Uppercase Artifacts Easy AMD Free
Combine Customer Orders and Products Medium Twilio Free
Anonymize User PII Data for a Social Media Platform Medium Atlassian Free
Product Sales and Inventory Data Medium PayPal Free
Products and Duplicates Medium JPMorgan Free
Mortgage Rate Calculator Medium NVIDIA Free
Weekend Order Detection Medium IBM Free
Flooring Company Data Medium Databricks Free
Rank Top Products by Revenue per Category Hard Coinbase Free
Highest SEO Score Pages per Domain Hard Cisco Free
Math Expressions Hard IBM Free
CSV and Partitions Easy Atlassian Free
Repartition Easy Robinhood Free
Broadcast Join Easy Databricks Free
Correcting Social Media Posts Easy Twitter Free
Daily Category Sales Aggregation Easy Microsoft Free
Cache and Performance Medium Palantir Free
Filter Popular Videos Medium Netflix Free
Anonymize User PII Medium Meta Free
Call Center Daily Stats Medium VMware Free
Venture Capital Sector Analysis Medium Cloudflare Free
Window Functions without Partitions Medium Google Free
Calculating PE Portfolio Values Medium IBM Free
Mountain Climber Logs Hard Stripe Free
Global & Domain SEO Leaders Hard Amazon Free
Tracking Customer Purchase History Hard Coinbase Free
Contains Duplicate Easy Apple Free
Valid Anagram Easy Anthropic Free
Two Sum Easy Cloudflare Free
Valid Palindrome Easy Capital One Free
Valid Parentheses Easy Splunk Free
Binary Search Easy Intel Free
Merge Two Sorted Lists Easy SAP Free
Invert Binary Tree Easy Robinhood Free
Maximum Depth of Binary Tree Easy Google Free
Diameter of Binary Tree Easy Atlassian Free
Balanced Binary Tree Easy Tesla Free
Same Tree Easy OpenAI Free
Subtree of Another Tree Easy Samsung Free
Group Anagrams Medium Netflix Free
Top K Frequent Elements Medium Cloudflare Free
Product of Array Except Self Medium Samsung Free
Longest Consecutive Sequence Medium Meta Free
Two Sum II - Input Array Is Sorted Medium Databricks Free
Three Sum Medium SAP Free
Container With Most Water Medium Amazon Free
Longest Substring Without Repeating Characters Medium GitHub Free
Longest Repeating Character Replacement Medium DoorDash Free
Permutation in String Medium OpenAI Free
Daily Temperatures Medium Intel Free
Car Fleet Medium JaneStreet Free
Search a 2D Matrix Medium SAP Free
Koko Eating Bananas Medium Meta Free
Find Minimum in Rotated Sorted Array Medium AMD Free
Search in Rotated Sorted Array Medium Anthropic Free
Remove Nth Node From End of List Medium Cloudflare Free
Add Two Numbers Medium Google Free
Lowest Common Ancestor of a BST Medium Stripe Free
Binary Tree Level Order Traversal Medium Atlassian Free
Validate Binary Search Tree Medium SAP Free
Kth Smallest Element in a BST Medium Datadog Free
K Closest Points to Origin Medium Atlassian Free
Kth Largest Element in an Array Medium Microsoft Free
Task Scheduler Medium Samsung Free
Combination Sum Medium Bloomberg Free
Permutations Medium PayPal Free
Number of Islands Medium Vercel Free
Course Schedule II Medium Bloomberg Free
Graph Valid Tree Medium Coinbase Free
Network Delay Time Medium Salesforce Free
Jump Game Medium Elastic Free
Jump Game II Medium Snowflake Free
Gas Station Medium JPMorgan Free
Partition Labels Medium DoorDash Free
Create Branch from Detached HEAD State Easy CGI Free
Rebase Feature Branch Easy GitHub Free
Apply Specific Stash from Multiple Stashes Easy UBS Free
Remove Last Commit and Discard Changes Easy GitLab Free
Checkout Single File from Another Branch Easy Twilio Free
Cherry-Pick Specific Commit Easy Ubisoft Free
Restore File to Previous Version Medium Slack Free
Create an Annotated Tag Medium Nintendo Free
Add Git Submodule Medium EY Free
Update Submodule to Latest Commit Medium GoDaddy Free
Stash Work, Fix Bug, Restore and Update Medium IBM Free
Remove File from Entire Git History Medium Netflix Free
Merge Repositories Preserving Both Histories Medium Zscaler Free
Fix Repository with Unrelated Histories Medium Zscaler Free
Recover Lost Commits from Detached HEAD Medium Kayak Free
Merge Customer Records from Two Sources Easy Lyft Free
Filter Funded Startups Easy Salesforce Free
Assign Row Numbers to Authors per Paper Medium Cloudflare Free
Amusement Park Rating Anomalies Medium GitHub Free
Usage and Accuracy per Model Type Medium VMware Free
Find the Last Climber per Mountain Medium Bloomberg Free
Track Product Purchases Hard Microsoft Free
Most Common Order Status Easy Airbnb Free
Calculating Overtime Pay Easy Cisco Free
Top Products by Revenue Medium Walmart Free
Product Summary Medium Amazon Free
Parsing Comma-Separated Values Medium Revolut Free
Number of Connected Components in an Undirected Graph Medium Stripe Free
Course Schedule Medium Uber Free
Walls and Gates Medium Amazon Free
Surrounded Regions Medium Meta Free
Pacific Atlantic Water Flow Medium Apple Free
Max Area of Island Medium Netflix Free
Clone Graph Medium GitHub Free
Subsets Medium Visa Free
Binary Tree Right Side View Medium Okta Free
Linked List Cycle Easy Google Free
Copy List with Random Pointer Medium Apple Free
Reorder List Medium Samsung Free
Reverse Linked List Easy Google Free
Evaluate Reverse Polish Notation Medium Google Free
Min Stack Medium Google Free
LRU Cache Medium Google Free
Implement Trie (Prefix Tree) Medium Google Free
Design Add and Search Words Data Structure Medium Google Free
Design Twitter Medium Google Free
Sliding Window Median Hard Google Free
Subarray Sum Equals K Medium Google Free
Accounts Merge Medium Google Free
Continuous Subarray Sum Medium Google Free
Moving Average from Data Stream Easy Amazon Free
Top K Frequent Elements in Stream Medium JPMorgan Free
Log Aggregator Medium Microsoft Free
Event Stream Deduplicator Medium Google Free
Skew-Aware Key Partitioner Medium Okta Free
Hash Join Simulator Medium Apple Free
CSV Row Filter and Count Easy DoorDash Free
Analyze Sales Dataset Dimensions and Calculate Total Revenue Easy Databricks Free
Sort Avro Employee Records by Salary Easy GitHub Free
Count User Events from JSON Activity Logs Easy Uber Free
Split Delimited Column into Separate Columns with Pandas Easy Snowflake Free
Compare SQLite Database and CSV File Records Easy Robinhood Free
Analyze DataFrame Memory Usage Easy SAP Free
Time-Series Rolling Window Analysis for Multi-Stock Price Data Medium HashiCorp Free
Calculate Descriptive Statistics for Numeric Columns in Pandas Easy Google Free
Decompose Time-Series Data into Trend, Seasonal, and Residual Components Medium Instacart Free
Parse JSON Log Files and Extract Fields to CSV Easy Okta Free
Extract Schema Information from Parquet File Using PyArrow Easy Palantir Free
Select Specific Columns from Parquet File Easy OpenAI Free
Flatten Nested Struct Columns in Parquet and Export to CSV Medium Coinbase Free
Merge Customer and Purchase Data Using Pandas Easy Mastercard Free
SQL JOIN with Pandas Data Processing and CSV Export Medium Intel Free
Insert New Records into SQLite Database from CSV Medium Visa Free
Aggregate SQL Query Results with Pandas and Export to Excel Medium Meta Free
Aggregate Time-Series Data into Fixed Time Windows Hard Tesla Free
Export SQLite Database to Parquet Format with Metadata Hard GitLab Free
Interpolate Missing Values in Irregular Time-Series Sensor Data Hard VMware Free
Remove Seasonal Effects from Time-Series Sales Data Hard Cloudflare Free
Convert Excel Files with Multiple Sheets to Individual CSV Files Easy Airbnb Free
Combine Data from Multiple Sources into Unified Report Hard Vercel Free
Need more practice in this area? Explore more questions →