Common Problems
Design Uber
Understanding the Problem
Functional Requirements
Core Requirements
- Riders should be able to input their current location and desired destination to see an estimated fare.
- Riders should be able to request a ride and be matched with a nearby available driver in real-time.
- Drivers should be able to accept ride requests and navigate to the user's location and destination.
Below the line (out of scope)
- Riders should be able to rate their ride and driver post-trip.
- Drivers should be able to rate passengers.
- Riders should be able to schedule rides in advance.
- Riders should be able to request different categories of rides (e.g., X, XL, Comfort).
Non-Functional Requirements
Core Requirements
- The system should prioritize low latency for ride matching to ensure quick response times for users and drivers.
- The system should ensure strong consistency in ride matching to prevent any driver from being assigned multiple rides simultaneously.
- The system should be highly available and reliable, minimizing downtime and ensuring that ride requests can be processed 24/7.
- The system should be able to handle high throughput, especially during peak hours or special events.
Below the line (out of scope)
- The system should ensure the security and privacy of user and driver data, complying with regulations like GDPR.
- The system should be resilient to failures, with redundancy and failover mechanisms in place.
- The system should have robust monitoring, logging, and alerting to quickly identify and resolve issues.
- The system should facilitate easy updates and maintenance without significant downtime (CI/CD pipelines).
The Set Up
Planning the Approach
Before you move on to designing the system, it's important to start by taking a moment to plan your strategy. Fortunately, for these common users facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don't get lost in the weeds as you go. Once you've satisfied the functional requirements, you'll rely on your non-functional requirements to guide you through the deep dives.
Defining the Core Entities
I like to begin with a broad overview of the primary entities. At this stage, it is not necessary to know every specific column or detail. We will focus on the intricacies, such as columns and fields, later when we have a clearer grasp. Initially, establishing these key entities will guide our thought process and lay a solid foundation as we progress towards defining the API.
To satisfy our key functional requirements, we'll need the following entities:
- Rider: This is any user who uses the platform to request rides. It includes personal information such as name and contact details, preferred payment methods for ride transactions, etc.
- Driver: This is any users who are registered as drivers on the platform and provide transportation services. It has their personal details, vehicle information (make, model, year, etc.), and preferences, and availability status.
- Ride: This entity represents an individual ride from the moment a rider requests an estimated fare all the way until its completion. It records all pertinent details of the ride, including the identities of the rider and the driver, vehicle details, state, the planned route, the actual fare charged at the end of the trip, and timestamps marking the pickup and drop-off.
- Location: This entity stores the real-time location of drivers. It includes the latitude and longitude coordinates, as well as the timestamp of the last update. This entity is crucial for matching riders with nearby drivers and for tracking the progress of a ride.
In the actual interview, this can be as simple as a short list like this. Just make sure you talk through the entities with your interviewer to ensure you are on the same page.
Now, let's proceed to design our system, tackling each functional requirement in sequence. This step-by-step approach will help us maintain focus and manage scope effectively, ensuring a cohesive build-up of the system's architecture.
The API
The API for retrieving a fare estimate is straightforward. We define a simple POST endpoint that takes in the user's current location and desired destination and returns a partial Ride object with the estimated fare and eta. We use POST here because we will be creating a new ride object in the database.
POST /fare-estimate?pickupLocation={pickup}&destination={destination} -> Partial<Ride> Partial<Ride>: { "id": number, "price": number, "eta": DateTime }
Request Ride Endpoint: This endpoint is used by riders to confirm their ride request after reviewing the estimated fare. It initiates the ride matching process by signaling the backend to find a suitable driver.
PATCH /rides/request Body: { "rideId": string, // This is the id returned from the fare estimate endpoint }
Update Driver Location Endpoint: Before we can do any matching, we need to know where our drivers are. This endpoint is used by drivers to update their location in real-time. It is called periodically by the driver client to ensure that the driver's location is always up to date.
POST /drivers/location/update Body: { "latitude": double, "longitude": double } - note the driverId is present in the session cookie or JWT and not in the body
Accept Ride Request Endpoint: This endpoint allows drivers to accept a ride request. Upon acceptance, the system updates the ride status and provides the driver with the pickup location coordinates.
PATCH /rides/accept Body: { "rideId": string, } Response: { "status": "success" | "error", "pickupLatitude": double, "pickupLongitude": double }
Update Ride Status Endpoint: This endpoint allows drivers to update the status of a ride. It is used to indicate that the driver has picked up the rider and that the ride is in progress, as well as to indicate that the ride has been completed.
PATCH /rides/status/update Body: { "rideId": string, "status": "picked_up_rider" | "completed" }
High-Level Design
1) Riders should be able to input their current location and desired destination to see an estimated fare
The first thing that users will do when they open the app to request a ride is search for their desired destination. At this point, the client will make a request to our service to get an estimated price for the ride. The user will then have a chance to request a ride with this fare or do nothing.
Lets lay out the necessary components for communicating between the client and our microservices, adding our first service, “Ride Service” which will handle fare estimations
The core components necessary to fulfill fare estimation are:
- Rider Client: The primary touchpoint for users is the Rider Client, available on iOS and Android. This client interfaces with the system's backend services to provide a seamless user experience.
- Load Balancer: This component is responsible for evenly distributing incoming traffic to the API Gateway. It enhances the system's fault tolerance and availability by preventing any single point of overload.
- API Gateway: Acting as the entry point for client requests, the API Gateway routes requests to the appropriate microservices. It also manages cross-cutting concerns such as authentication and rate limiting.
- Ride Service: This microservice is tasked with managing ride state, starting with calculating fare estimates. It interacts with third-party mapping APIs to determine the distance and travel time between locations and applies the company's pricing model to generate a fare estimate. For the sake of this interview, we abstract this complexity away.
- Third Party Mapping API: We use a third-party service (like Google Maps) to provide mapping and routing functionality. It is used by the Ride Service to calculate the distance and travel time between locations.
- Ride Database: The ride database is responsible for storing Ride entities. It this case, it creates a ride with a status of "fare_estimated" and the estimated fare and eta. This allows us to track the ride and ensure that the user is charged the correct amount at the end of the trip.
Let's walk through exactly how these component interact when a rider requests a fare estimate.
- The rider enters their pickup location and desired destination into the client app, which sends a GET request to our backend system via /fare-estimate?pickupLocation={pickup}&destination={destination}
- Our load balancer receives the request and routes it to the API gateway, choosing the path with the least traffic.
- The API gateway handles any necessary authentication and rate limiting before forwarding the request to the Ride Service.
- The Ride Service makes a request to the Third Party Mapping API to calculate the distance and travel time between the pickup and destination locations and then applies the company's pricing model to the distance and travel time to generate a fare estimate.
- The Ride Service creates a new Ride entity in the Ride Database with a status of "fare_estimated" and the estimated fare.
- The service then returns the relevant fields of the Ride entity to the API Gateway, which forwards it to the Rider Client so they can make a decision about whether to request a ride.
2) Users should be able to request a ride and be matched with nearby available drivers in real-time
Once a user reviews the estimated fare and ETA, they can request a ride. By building upon our existing design, we can extend it to support ride matching. This requires four new components.
- Driver Client: In addition to the Rider Client, we introduce the Driver Client, which is the interface for drivers to receive ride requests and provide location updates. The Driver Client communicates with the Location Service to send real-time location updates.
- Location Service: Manages the real-time location data of drivers. It is responsible for receiving location updates from drivers, storing this information in the Driver Location Database, and providing the Ride Matching Service with the latest location data to facilitate accurate and efficient driver matching.
- Driver Location Database: Store the real-time location data of drivers. It allows for quick identification of available drivers near the ride request location.
- Ride Matching Service: Handles incoming ride requests and utilizes a sophisticated algorithm (abstracted away for the purpose of this interview) to match these requests with the best available drivers based on proximity, availability, driver rating, and other relevant factors.
Let's walk through the sequence of events that occur when a user requests a ride and the system matches them with a nearby driver:
- The user confirms their ride request in the client app, which sends a POST request to our backend system with the rideId we created with the fare estimate.
- The load balancer receives the request and efficiently routes it to the API gateway, ensuring balanced traffic distribution.
- The API gateway performs necessary authentication and rate limiting before forwarding the request to the Ride Matching Service.
- The Ride Matching Service receives the request and first loads the Ride object from the Ride DB to get the rider's source location. It then uses the Location Service to identify nearby available drivers and select the best match based on the drivers current state and location.
- Throughout this process, the Location Service continuously receives location updates from drivers, ensuring that the Driver Location Database is up-to-date for accurate matching.
3) Drivers should be able to accept ride requests and navigate to the user's location and destination
Once a driver is matched with a rider, they can accept the ride request and navigate to the pickup location.
We only need to add one additional service to our existing design.
- Notification Service: Responsible for dispatching real-time notifications to drivers when a new ride request is matched to them. It ensures that drivers are promptly informed so they can accept ride requests in a timely manner, thus maintaining a fluid user experience. Notifications are sent via APN (Apple Push Notification) and FCM (Firebase Cloud Messaging) for iOS and Android devices, respectively.
Let's walk through the sequence of events that occur when a driver accepts a ride request and completes the ride:
- After the Ride Matching Service determines the ranked list of eligible drivers, it sends a notification to the top driver on the list via APN or FCM.
- The driver receives a notification (via APN or FCM) that a new ride request is available. They open the Driver Client app and accept the ride request, which sends a POST request to our backend system with the rideID a) If they decline the ride instead, the system will send a notification to the next driver on the list.
- The load balancer receives the request and efficiently routes it to the API gateway, ensuring balanced traffic distribution.
- The API gateway performs necessary authentication and rate limiting before forwarding the request to the Ride Matching Service.
- The Ride Service receives the request and updates the status of the ride to "accepted" and updates the assigned driver accordingly. It then returns the pickup location coordinates to the Driver Client.
- The driver navigates to the pickup location and picks up the rider. Once the rider is in the vehicle, the driver updates the status of the ride to "picked_up_rider" via the Driver Client.
- Lastly, after navigating to the destination using on-client navigation, the driver updates the status of the ride to "completed" via the Driver Client.
Potential Deep Dives
With the core functional requirements met, it's time to dig into the non-functional requirements via deep dives. These are the main deep dives I like to cover for this question.
1) How do we handle frequent driver location updates and efficient proximity searches on location data?
Managing the high volume of location updates from drivers and performing efficient proximity searches to match them with nearby ride requests is a complex challenge. Here's a tiered approach to addressing it:
Bad Solution: Direct Database Writes and Proximity Queries
Good Solution: Batch Processing and Specialized Geospatial Database
Great Solution: Real-Time In-Memory Geospatial Data Store
2) How can we manage system overload from frequent driver location updates while ensuring location accuracy?
High-frequency location updates from drivers can lead to system overload, straining server resources and network bandwidth. This overload risks slowing down the system, leading to delayed location updates and potentially impacting user experience. In most candidates original design, they have drivers ping a new location every 5 seconds or so. This follow up question is designed to see if they can intelligently reduce the number of pings while maintaining accuracy.
Great Solution: Adaptive Location Update Intervals
3) How do we prevent multiple ride requests from being sent to the same driver simultaneously?
We defined consistency in ride matching as a key non-functional requirment. This means that we only request one driver at a time for a given ride request AND that each driver only receives one ride request at a time. That driver would then have 10 seconds to accept or deny the request before we move on to the next driver if necessary. If you've solved Ticketmaster before, you know this problem well -- as it's almost exactly the same as ensuring that a ticket is only sold once.
Bad Solution: Application-Level Locking with Manual Timeout Checks
Good Solution: Database Status Update with Timeout Handling
Great Solution: Distributed Lock with TTL
4) How can we ensure no ride requests are dropped during peak demand periods?
During peak demand periods, the system may receive a high volume of ride requests, which can lead to dropped requests. This is particularly problematic during special events or holidays when demand is high and the system is under stress. We also need to protect against the case where an instance of the Ride Matching Service crashes or is restarted, leading to dropped rides.
Bad Solution: First-Come, First-Served with No Queue
Great Solution: Queue with Dynamic Scaling
5) How can you further scale the system to reduce latency and improve throughput?
Bad Solution: Vertical Scaling
Great Solution: Geo-Sharding with Read Replicas
After applying the "Great" solutions, your updated whiteboard should look something like this:
What is Expected at Each Level?
Ok, that was a lot. You may be thinking, “how much of that is actually required from me in an interview?” Let’s break it down.
Mid-level
Breadth vs. Depth: A mid-level candidate will be mostly focused on breadth (80% vs 20%). You should be able to craft a high-level design that meets the functional requirements you've defined, but many of the components will be abstractions with which you only have surface-level familiarity.
Probing the Basics: Your interviewer will spend some time probing the basics to confirm that you know what each component in your system does. For example, if you add an API Gateway, expect that they may ask you what it does and how it works (at a high level). In short, the interviewer is not taking anything for granted with respect to your knowledge.
Mixture of Driving and Taking the Backseat: You should drive the early stages of the interview in particular, but the interviewer doesn’t expect that you are able to proactively recognize problems in your design with high precision. Because of this, it’s reasonable that they will take over and drive the later stages of the interview while probing your design.
The Bar for Uber: For this question, an E4 candidate will have clearly defined the API endpoints and data model, landed on a high-level design that is functional and meets the requirements. They would have understood the need for some spatial index to speed up location searches, but may not have landed on a specific solution. They would have also implemented at least the "good solution" for the ride request locking problem.
Senior
Depth of Expertise: As a senior candidate, expectations shift towards more in-depth knowledge — about 60% breadth and 40% depth. This means you should be able to go into technical details in areas where you have hands-on experience. It's crucial that you demonstrate a deep understanding of key concepts and technologies relevant to the task at hand.
Advanced System Design: You should be familiar with advanced system design principles. For example, knowing how to use a search-optimized data store like Elasticsearch for event searching is essential. You’re also expected to understand the use of a distributed cache for locking tickets and to discuss detailed scaling strategies (it’s ok if this took some probing/hints from the interviewer), including sharding and replication. Your ability to navigate these advanced topics with confidence and clarity is key.
Articulating Architectural Decisions: You should be able to clearly articulate the pros and cons of different architectural choices, especially how they impact scalability, performance, and maintainability. You justify your decisions and explain the trade-offs involved in your design choices.
Problem-Solving and Proactivity: You should demonstrate strong problem-solving skills and a proactive approach. This includes anticipating potential challenges in your designs and suggesting improvements. You need to be adept at identifying and addressing bottlenecks, optimizing performance, and ensuring system reliability.
The Bar for Uber: For this question, E5 candidates are expected to speed through the initial high level design so you can spend time discussing, in detail, at least 2 of the solutions to speed up location searches, the ride request locking problem, or the ride request queueing problem. You should also be able to discuss the pros and cons of different architectural choices, especially how they impact scalability, performance, and maintainability.
Staff+
Emphasis on Depth: As a staff+ candidate, the expectation is a deep dive into the nuances of system design — I'm looking for about 40% breadth and 60% depth in your understanding. This level is all about demonstrating that, while you may not have solved this particular problem before, you have solved enough problems in the real world to be able to confidently design a solution backed by your experience.
You should know which technologies to use, not just in theory but in practice, and be able to draw from your past experiences to explain how they’d be applied to solve specific problems effectively. The interviewer knows you know the small stuff (REST API, data normalization, etc) so you can breeze through that at a high level so you have time to get into what is interesting.
High Degree of Proactivity: At this level, an exceptional degree of proactivity is expected. You should be able to identify and solve issues independently, demonstrating a strong ability to recognize and address the core challenges in system design. This involves not just responding to problems as they arise but anticipating them and implementing preemptive solutions. Your interviewer should intervene only to focus, not to steer.
Practical Application of Technology: You should be well-versed in the practical application of various technologies. Your experience should guide the conversation, showing a clear understanding of how different tools and systems can be configured in real-world scenarios to meet specific requirements.
Complex Problem-Solving and Decision-Making: Your problem-solving skills should be top-notch. This means not only being able to tackle complex technical challenges but also making informed decisions that consider various factors such as scalability, performance, reliability, and maintenance.
Advanced System Design and Scalability: Your approach to system design should be advanced, focusing on scalability and reliability, especially under high load conditions. This includes a thorough understanding of distributed systems, load balancing, caching strategies, and other advanced concepts necessary for building robust, scalable systems.
The Bar for Uber: For a staff+ candidate, expectations are high regarding depth and quality of solutions, particularly for the complex scenarios discussed earlier. Great candidates are diving deep into at least 3+ key areas, showcasing not just proficiency but also innovative thinking and optimal solution-finding abilities. A crucial indicator of a staff+ candidate's caliber is the level of insight and knowledge they bring to the table. A good measure for this is if the interviewer comes away from the discussion having gained new understanding or perspectives.
Loading comments...