Common Problems
Design a Ticket Booking Site Like Ticketmaster
Understanding the Problem
Functional Requirements
Core Requirements
- Users should be able to view events
- Users should be able to book tickets to events
- Users should be able to search for events
Below the line (out of scope):
- Users should be able to view their booked events
- Admins or event coordinators should be able to add events
- Popular events should have dynamic pricing
Non-Functional Requirements
Core Requirements
- The system should prioritize availability for searching & viewing events, but should prioritize consistency for booking events (no double booking)
- The system should be scalable and able to handle high throughput in the form of popular events
- The system is read heavy, and thus needs to be able to support high read throughput
Below the line (out of scope):
- The system should protect user data and adhere to GDPR
- The system should be fault tolerant
- The system should provide secure transactions for purchases
- The system should be well tested and easy to deploy (CI/CD pipelines)
- The system should have regular backups
Here's how it might look on your whiteboard:
The Set Up
Planning the Approach
Before you move on to designing the system, it's important to start by taking a moment to plan your strategy. Fortunately, for these common users facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don't get lost in the weeds as you go. Once you've satisfied the functional requirements, you'll rely on your non-functional requirements to guide you through the deep dives.
Defining the Core Entities
I like to begin with a broad overview of the primary entities. At this stage, it is not necessary to know every specific column or detail. We will focus on the intricacies, such as columns and fields, later when we have a clearer grasp. Initially, establishing these key entities will guide our thought process and lay a solid foundation as we progress towards defining the API.
To satisfy our key functional requirements, we'll need the following entities:
- Event: This entity stores essential information about an event, including details like the date, description, type, and the performer or team involved. It acts as the central point of information for each unique event.
- Performer: Represents the individual or group performing or participating in the event. Key attributes for this entity include the performer's name, a brief description, and potentially links to their work or profiles. (Note: this could be artist, company, collective — a lot of different entities. The choice of “performer” is intending to be general enough to cover all possible groups)
- Venue: Represents the physical location where an event is held. Each venue entity includes details such as address, capacity, and a specific seat map, providing a layout of seating arrangements unique to the venue.
- Ticket: Contains information related to individual tickets for events. This includes attributes such as the associated event ID, seat details (like section, row, and seat number), pricing, and status (available or sold).
- Booking: Records the details of a user's ticket purchase. It typically includes the user ID, a list of ticket IDs being booked, total price, and booking status (such as in-progress or confirmed). This entity is key in managing the transaction aspect of the ticket purchasing process.
In the actual interview, this can be as simple as a short list like this. Just make sure you talk through the entities with your interviewer to ensure you are on the same page.
The API
The API for viewing events is straightforward. We create a simple GET endpoint that takes in an eventId and return the details of that event.
GET /events/details/{eventId} -> Event & Venue & Performer & Ticket[] - tickets are to render the seat map on the Client
The ticket booking process is divided into two steps:
- The user first chooses a specific seat or ticket to buy.
- They then progress to a payment page to complete the purchase.
We want to guarantee that once a user selects a ticket, it is reserved for them until they complete the purchase, preventing it from being bought by someone else while they try to check out.
We’re going to need two API endpoints, one for each of the steps in the process:
POST /booking/checkout -> bookingId { "ticketIds": string[] } POST /booking/confirm -> Success / Failure { "bookingId": string "paymentDetails": ... }
Lastly, for search, we just need a single GET endpoint that takes in a set of search parameters and returns a list of events that match those parameters.
GET /events/search?keyword={keyword}&start={start_date}&end={end_date}&pageSize={page_size}&page={page_number} -> Event[]
High-Level Design
1) Users should be able to view events
When a user navigates to www.yourticketmaster.com/event/{eventId} they should see details about that event. Crucially, this should include a seatmap showing seat availability. The page will also display the event's name, along with a description. Key information such as the location (including venue details), event dates, and facts about the performers or teams involved could be outlined.
We start by laying out the core components for communicating between the client and our microservices. We add our first service, "Event CRUD Service," which connects to a database to store the event, venue, and performer data outlined in the Data Model above. This service will handle the reading/viewing of events. Although we are not currently focused on the admin flow for this design, this service would theoretically also be responsible for creating, updating, and deleting events.
- Clients: There are three primary clients - WebApp, iOS, and Android. Users will interact with the system through these clients. All client requests will be routed to the system's backend through a Load Balancer.
- Load Balancer: Its primary purpose is to distribute incoming application traffic across multiple targets, such as the API Gateway, in this case. This increases the availability and fault tolerance of the application.
- API Gateway: This serves as an entry point for clients to access the different microservices of the system. This layer can also handle authentication, rate limiting, and other cross-cutting concerns.
- Event CRUD Service: Our first microservice is responsible for creating, reading, updating, and deleting events. In relation to our requirement to view events, this service is responsible for handling view API requests by fetching the necessary event, venue, and performer information from the database and returning the results to the client.
- Events DB: Stores tables for events, performers, and venues.
Let's walk through exactly what happens when a user makes a request to www.yourticketmaster.com/event/{eventId} to view an event.
- The client makes a REST GET request with the eventId
- Our load balancer accepts the request and routes it to the API gateway with the fewest current connections.
- The API gateway then, after handling basic authentication and rate limiting, forward the request onto our Event CRUD Service.
- The Event CRUD Service then queries the Events DB for the event, venue, and performer information and returns it to the client.
2) Users should be able to book tickets to events
This time we have a bit more to consider though. We need to ensure that the ticket is locked for the user while they are checking out. We also need to ensure that if the user abandons the checkout process, the ticket is released for other users to purchase. Finally, we need to ensure that if the user completes the checkout process, the ticket is marked as sold and the booking is confirmed. Here are a couple ways we could do this:
Bad Solution: Pessimistic Locking
Good Solution: Status & Expiration Time on Ticket Table
Great Solution: Distributed Lock with TTL
In this case, let's go with the great solution and use distributed lock. We can now update our design to support this flow.
Here is what we added:
- New Tables in Events DB: First we add two new tables to our database, Bookings and Tickets. The Bookings table will store the details of each booking, including the user ID, ticket IDs, total price, and booking status. The Tickets table will store the details of each ticket, including the event ID, seat details, pricing, and status. The Tickets table will also have a bookingId column that links it to the Bookings table.
- Booking Service: This microservice is responsible for the core functionality of the ticket booking process. It interacts with databases that store data on bookings and tickets.
- It interfaces with the Payment Processor (Stripe) for transactions. Once a payment is confirmed, the booking service updates the ticket status to "sold".
- It communicates with the Bookings and Tickets tables to fetch, update, or store relevant data.
- It utilizes a Redis Distributed Lock to ensure that a ticket is locked for a user while they are checking out. This lock is released once the user completes the purchase or if the TTL expires.
- Payment Processor (Stripe): An external service responsible for handling payment transactions. Once a payment is processed, it notifies the booking service of the transaction status.
- Ticket Lock: A distributed lock is implemented using Redis to ensure that a ticket is locked for a user while they are checking out. This lock is released once the user completes the purchase or if the TTL of 10 minutes expires.
Now, when a user wants to book a ticket:
- A user will select a seat from the interactive seat map. This will trigger a POST /booking/checkout with the ticketId associated with that seat.
- The request will be forwarded from our load balancer, to our API gateway, and onto the Booking Service.
- The Booking Service will lock that ticket by adding it to our Redis Distributed Lock with a TTL of 10 minutes (this is how long we will hold the ticket for).
- The Booking Service will also write a new booking entry in the DB with a status of in-progress.
- We will then respond to the user with their newly created bookingId and route the client to a the payment page.
- If the user stops here, then after 10 minutes the lock is auto-released and the ticket is available for another user to purchase.
- The user will fill out their payment details and click “Purchase.” In doing so, the payment (along with the bookingId) gets sent to Stripe for processing and Stripe responds via webhook that the payment was successful.
- Upon successful payment confirmation from Stripe, our system's webhook retrieves the bookingId embedded within the Stripe metadata. With this bookingId, the webhook initiates a database transaction to concurrently update the Ticket and Booking tables. Specifically, the status of the ticket linked to the booking is changed to "sold" in the Ticket table. Simultaneously, the corresponding booking entry in the Booking table is marked as "confirmed."
- Now the ticket is booked!
3) Users should be able to search for events
Sweet, we now have the core functionality in place to book a ticket! But how are users supposed to find events in the first place? When users first open your site, they expect to be able to search for upcoming events. This search will be parameterized based on any combination of keywords, artists/teams, location, date, or event type.
Let’s start with the most basic thing you could do. Just connect your search service to your DB and query it by filtering for the fields in the API request. This has issues, but it’s a good starting point. We will dig into better options in the deep dives below.
When a user makes a search request, it's straightforward:
- The client makes a REST GET request with the search parameters
- Our load balancer accepts the request and routes it to the API gateway with the fewest current connections.
- The API gateway then, after handling basic authentication and rate limiting, forward the request onto our Search Service.
- The Search Service then queries the Events DB for the events matching the search parameters and returns them to the client.
Potential Deep Dives
With the core functional requirements met, it's time to dig into the non-functional requirements via deep dives. These are the main deep dives I like to cover for this question:
1) How is the view API going to scale to support 10s of millions of concurrent requests during popular events?
In our non-functional requirements we mentioned that our view and search paths need to be highly available, including during peak traffic scenarios. To accomplish this, we need a combination of load balancing, horizontal scaling, and caching.
Great Solution: Caching, Load Balancing, and Horizontal Scaling
2) How will the system ensure a good user experience during high-demand events with millions simultaneously booking tickets?
With popular events, the loaded seat map will go stale quickly. Users will grow frustrated as they repeatedly click on a seat, only to find out it has already been booked. We need to ensure that the seat map is always up to date and that users are notified of changes in real-time.
Good Solution: SSE for Real-Time Seat Updates
Great Solution: Virtual Waiting Queue for Extremely Popular Events
3) How can you improve search to handle complex queries and high-volume traffic more efficiently?
Good Solution: Indexing & SQL Query Optimization
Great Solution: Use a Full-text Search Engine like Elasticsearch
4) How can you speed up frequently repeated search queries and reduce load on our search infrastructure?
Good Solution: Implement Caching Strategies Using Redis or Memcached
Great Solution: Great Answer: Implement Query Result Caching and Edge Caching Techniques
As you progress through the deep dives, you should be updating your design to reflect the changes you are making. After doing so, you could have a final design like looks something like this:
What is Expected at Each Level?
Ok, that was a lot. You may be thinking, “how much of that is actually required from me in an interview?” Let’s break it down.
Mid-level
Breadth vs. Depth: A mid-level candidate will be mostly focused on breadth (80% vs 20%). You should be able to craft a high-level design that meets the functional requirements you've defined, but many of the components will be abstractions with which you only have surface-level familiarity.
Probing the Basics: Your interviewer will spend some time probing the basics to confirm that you know what each component in your system does. For example, if you add an API Gateway, expect that they may ask you what it does and how it works (at a high level). In short, the interviewer is not taking anything for granted with respect to your knowledge.
Mixture of Driving and Taking the Backseat: You should drive the early stages of the interview in particular, but the interviewer doesn’t expect that you are able to proactively recognize problems in your design with high precision. Because of this, it’s reasonable that they will take over and drive the later stages of the interview while probing your design.
The Bar for Ticketmaster: For this question, an E4 candidate will have clearly defined the API endpoints and data model, landed on a high-level design that is functional for at least viewing and booking events. They are able to solve the “No Double Booking” problem with at least the "Good Solution" which uses status field, timeout, and chron job. Any additional depth would be a bonus, but further deep dives wouldn’t be expected.
Senior
Depth of Expertise: As a senior candidate, expectations shift towards more in-depth knowledge — about 60% breadth and 40% depth. This means you should be able to go into technical details in areas where you have hands-on experience. It's crucial that you demonstrate a deep understanding of key concepts and technologies relevant to the task at hand.
Advanced System Design: You should be familiar with advanced system design principles. For example, knowing how to use a search-optimized data store like Elasticsearch for event searching is essential. You’re also expected to understand the use of a distributed cache for locking tickets and to discuss detailed scaling strategies (it’s ok if this took some probing/hints from the interviewer), including sharding and replication. Your ability to navigate these advanced topics with confidence and clarity is key.
Articulating Architectural Decisions: You should be able to clearly articulate the pros and cons of different architectural choices, especially how they impact scalability, performance, and maintainability. You justify your decisions and explain the trade-offs involved in your design choices.
Problem-Solving and Proactivity: You should demonstrate strong problem-solving skills and a proactive approach. This includes anticipating potential challenges in your designs and suggesting improvements. You need to be adept at identifying and addressing bottlenecks, optimizing performance, and ensuring system reliability.
The Bar for Ticketmaster: For this question, E5 candidates are expected to speed through the initial high level design so you can spend time discussing, in detail, optimizing search, handling no double booking (landing on a distributed lock or other quality solution), and even have a discussion on handling popular events, showcasing your depth of understanding in managing scalability and reliability under high load conditions.
Staff+
Emphasis on Depth: As a staff+ candidate, the expectation is a deep dive into the nuances of system design — I'm looking for about 40% breadth and 60% depth in your understanding. This level is all about demonstrating that, while you may not have solved this particular problem before, you have solved enough problems in the real world to be able to confidently design a solution backed by your experience.
You should know which technologies to use, not just in theory but in practice, and be able to draw from your past experiences to explain how they’d be applied to solve specific problems effectively. The interviewer knows you know the small stuff (REST API, data normalization, etc) so you can breeze through that at a high level so you have time to get into what is interesting.
High Degree of Proactivity: At this level, an exceptional degree of proactivity is expected. You should be able to identify and solve issues independently, demonstrating a strong ability to recognize and address the core challenges in system design. This involves not just responding to problems as they arise but anticipating them and implementing preemptive solutions. Your interviewer should intervene only to focus, not to steer.
Practical Application of Technology: You should be well-versed in the practical application of various technologies. Your experience should guide the conversation, showing a clear understanding of how different tools and systems can be configured in real-world scenarios to meet specific requirements.
Complex Problem-Solving and Decision-Making: Your problem-solving skills should be top-notch. This means not only being able to tackle complex technical challenges but also making informed decisions that consider various factors such as scalability, performance, reliability, and maintenance.
Advanced System Design and Scalability: Your approach to system design should be advanced, focusing on scalability and reliability, especially under high load conditions. This includes a thorough understanding of distributed systems, load balancing, caching strategies, and other advanced concepts necessary for building robust, scalable systems.
The Bar for Ticketmaster: For a staff+ candidate, expectations are high regarding depth and quality of solutions, particularly for the complex scenarios discussed earlier. Great candidates are diving deep into at least 2-3 key areas, showcasing not just proficiency but also innovative thinking and optimal solution-finding abilities. A crucial indicator of a staff+ candidate's caliber is the level of insight and knowledge they bring to the table. A good measure for this is if the interviewer comes away from the discussion having gained new understanding or perspectives.
Loading comments...