profile pic Big_data
Upvote 0 Downvote
Finding the 95th Percentile of URL Sizes Data Engineer @ Yahoo Difficulty hard

Given a dataset of 2 billion URLs and their sizes, describe an efficient algorithm to find the 95th percentile of all the sizes. Provide a brief explanation and a sample implementation using a distributed computing framework.

Solution:

Please sign-in to view the solution

Upvote 0 Downvote
Calculating Average of Large Dataset Across Multiple Computers Data Engineer @ Yahoo Difficulty medium

Given a huge dataset of numbers distributed across multiple computers, describe an efficient algorithm to find the average of all the numbers. Provide a brief explanation and a sample implementation using a distributed computing framework.

Solution:

Please sign-in to view the solution

Upvote 0 Downvote
Accelerating Processing of Large Data Set Data Engineer @ Uber Difficulty hard

Given a data set of 60 million records and an O(n) computation that takes 2 weeks to complete on a single computer, how would you accelerate the processing time to be completed within 24 hours? Describe your approach and the technologies you would use.

Solution:

Please sign-in to view the solution

Upvote 0 Downvote
Processing Large Data Sets Using Apache Spark Data Engineer @ Google Difficulty hard

You have a large dataset stored in a distributed file system like HDFS, and you need to perform complex transformations and aggregations. Explain how you would use Apache Spark to process this dataset. Provide an example of a Spark job that calculates the average value of a specific column.

Solution:

Please sign-in to view the solution

Upvote 0 Downvote
Building a Scalable Data Engineering Solution for YouTube Data Engineer @ Google Difficulty hard

Describe the types of technologies and architecture you would need to build a scalable data engineering solution for a platform like YouTube. Focus on data ingestion, storage, processing, and analytics.

Solution:

Please sign-in to view the solution

Upvote 0 Downvote
Understanding Big Data and Hadoop Data Analyst @ Google Difficulty medium

What is Big Data and what is Hadoop used for?

Solution:

Please sign-in to view the solution