Skip to content

Site Reliability Engineer, System Eng. @ Google

  1. Layer 2 and Layer 3 Networking
    Section titled Layer 2 and Layer 3 Networking

    What are the primary differences between the Data Link Layer (Layer 2) and the Network Layer (Layer 3) in the OSI model, particularly in terms of addressing and device types used at each layer?

    Answer Breakdown
  2. How can we enable IP forwarding in the Linux kernel, and why is it disabled by default?

    Answer Breakdown
  3. For a containerization solution that requires isolated yet connected network environments for each container, which Virtual Network Interface (VNI) type would best suit this need? Provide a command example to create this VNI type in a Linux environment.

    Answer Breakdown
  4. How can you configure a Linux system to route traffic destined for a specific subnet through a different gateway, while keeping this routing rule isolated from the system’s main routing table? Provide the commands necessary to create a custom routing table and add a route to this table that directs traffic for the subnet 192.168.2.0/24 to pass through the gateway at 192.168.1.2 via the eth0 network interface.

    Answer Breakdown
  5. Address Resolution Protocol (ARP)
    Section titled Address Resolution Protocol (ARP)

    You need to verify the MAC address associated with an IP address on your local network. Which command would you use to check the ARP cache and possibly refresh it if the address is not found?

    Answer Breakdown

Here is the Google’s official SRE,SE preparation guide states: coding exercise will assess simple algorithm/data structure implementation. We are looking for a solution that shows you understand your language usage well with a clean and working implementation that’s efficient. On top of this you should be familiar with practical Linux scripting in bash.

  1. You have a server access log file access.log that follows this format:

    Terminal window
    10.0.0.1 - - [10/Jul/2023:11:45:07 +0000] "GET /index.html HTTP/1.1" 200 2326
    10.0.0.2 - - [10/Jul/2023:11:45:08 +0000] "POST /submit.php HTTP/1.1" 403 182
    10.0.0.3 - - [10/Jul/2023:11:45:09 +0000] "GET /about.html HTTP/1.1" 200 478
    10.0.0.9 - - [10/Jul/2023:11:45:09 +0000] "GET /login.html HTTP/1.1" 200 223

    Write an awk command to:

    • The number of successful (HTTP status code 200) GET requests for each unique resource (e.g., /index.html, /about.html)
    • Displaying the count and the resource path.
    Answer Breakdown
  2. Write a bash script that monitors system health and sends an alert if any of the following conditions are met:

    • The CPU usage exceeds 80% for more than 5 minutes.
    • The available disk space on the root partition is less than 10%.
    • If any condition is met, the script should output an appropriate message to the standard error (stderr) indicating the issue.

    You can use the following template to create your script:

    Terminal window
    check_cpu() {
    local threshold=80
    #... other variables
    while true; do
    # Check CPU usage
    # Your CPU usage check logic here
    done
    }
    check_disk_space() {
    local threshold=10
    #... other variables
    # Check disk space
    # Your disk space check logic here
    }
    check_disk_space
    check_cpu
    Answer Breakdown
  3. Create a script that backs up a specified directory (including all subdirectories) to a tarball, appending the current date to the filename. The script should also delete backups older than 30 days.

    You can use the following template to create your backup script:

    Terminal window
    # Directory to be backed up
    backup_source="/path/to/your/directory"
    # Backup storage directory
    backup_destination="/path/to/your/backup/directory"
    # Backup filename format: backup-YYYYMMDD.tar.gz
    backup_filename="backup-$(date +%Y%m%d).tar.gz"
    # Number of days to keep the backup
    days_to_keep=30
    # Creating the backup
    ## Your backup creation command here
    # Check if the backup was created successfully
    ## Success/Failure logic here
    Answer Breakdown
  4. You have a directory full of text files. Write a script to find and display all files that contain a specific keyword, along with the count of how many times that keyword appears in each file.

    Answer Breakdown
  5. Site Reliability Engineer, Systems Engineer (SRE, SE), at Google, it’s less common to encounter hard Data Structures and Algorithms (DSA) problems during the interview process. While medium complexity problems may occasionally arise, the focus is predominantly on ensuring candidates are comfortable and proficient in solving easy-level DSA problems.

    Below are some examples of easy-level DSA problems that you might encounter during a Google SRE, SE interview, the list is not exhaustive, but it gives you an idea of the types of problems you might face. Once you are comfortable with these, you can find more Easy or Medium DSA problems on coding platforms like LeetCode.

    Two Sum Problem: Given an array of integers, return indices of the two numbers such that they add up to a specific target.

    Valid Palindrome: Given a string, determine if it is a palindrome, considering only alphanumeric characters and ignoring cases.

    Valid Parentesis: Given a string containing just the characters ’(’, ’)’, ', ', ’[’ and ’]’, determine if the input string is valid.

    Roman to Integer: Given a roman numeral, convert it to an integer.

    Longest Common Prefix: Write a function to find the longest common prefix string amongst an array of strings.

    Reverse Integer: Given a 32-bit signed integer, reverse digits of an integer.

    Merge Two Sorted Lists: Merge two sorted linked lists and return it as a new sorted list.

    Remove Duplicates from Sorted Array: Given a sorted array nums, remove the duplicates in-place such that each element appears only once and returns the new length.

  1. Design a global video streaming service similar to Netflix or YouTube, focusing on scalability, reliability, and low latency. Consider the following:

    • How would you architect the system to support millions of concurrent users globally?
    • What strategies would you employ for content delivery and network efficiency?
    • Discuss how you would handle metadata storage, search functionality, and user personalization at scale.
  2. Design a scalable real-time messaging system like WhatsApp or Telegram that can support high-volume, low-latency messaging across the globe. Address the following points:

    • Describe the system architecture needed to ensure message delivery with minimal delay.
    • How would you design the data model to store conversations, ensure data consistency, and manage user presence status?
    • Explain the trade-offs between consistency, availability, and partition tolerance (CAP theorem) in your design.
    Answer Breakdown
  3. You are tasked by a Cloud Provider to create a CDN Product similar to Cloudflare. Design the architecture for the CDN service, focusing on content caching, load balancing, and global content delivery.

  4. Distributed File Storage System
    Section titled Distributed File Storage System

    Design a distributed file storage system similar to Google Drive or Dropbox, capable of storing and retrieving large amounts of data across a distributed network. Consider the following aspects:

    • Outline the system architecture, focusing on data distribution, redundancy, and fault tolerance.
    • How would you ensure fast and reliable access to files for users worldwide?
    • Discuss the security measures and encryption strategies to protect user data.
  1. Understanding Threads and Processes
    Section titled Understanding Threads and Processes

    Explain the difference between a process and a thread in a Linux operating system. How do threads differ from processes in terms of resource allocation and execution?

    Write a command to list all threads of a specific process using the process ID (PID).

    Answer Breakdown
  2. Describe a scenario where a deadlock could occur in a system.

    Write command to detect and trace a deadlock in a Linux system.

    Answer Breakdown
  3. Context Switching and Scheduling
    Section titled Context Switching and Scheduling

    Explain the concept of context switching in the Linux operating system. Discuss how context switching impacts system performance and how the scheduler plays a role in managing the execution of processes and threads. Additionally, describe the role of modern concurrency constructs like mutexes and semaphores in minimizing the cost of context switches.

    Also describe the steps you would take to diagnose the high context switching rate on the server.

    Answer Breakdown
  4. System Calls in Unix/Linux for Containerization
    Section titled System Calls in Unix/Linux for Containerization

    Describe how the clone(), unshare(), and cgroups (control groups) system calls or mechanisms contribute to the underlying functionality of containerization?

    Answer Breakdown
  5. In Unix/Linux systems, what is the difference between static and dynamic linking? Describe a scenario where you would prefer one over the other.

    Answer Breakdown
  6. Explain how a Unix/Linux operating system handles memory overcommitment.

    What mechanism it uses to decide which processes to terminate when the system runs out of physical memory and swap space (OOM condition)?

    Answer Breakdown