High Level System Design
Before we go into details, it's important to articulate a high-level overview of the architecture and major components of your system. Think of these components as significant landmarks on a city map; each with its own specific purpose and role within the overall system.
- Choose an architecture.
- List off the relevant system components and give a brief, 1-2 sentence description of each.
You will go into more detail on core components later in the interview. Your goal here is to propose a high-level design and get buy-in from your interviewer before delving into the details. In many interviews, you will draw the system using a shared whiteboard like https://excalidraw.com/
Every interviewer's expectations are slightly different. Be sure to clearly communicate your plan and give them the opportunity to guide you in a different direction if they choose. For example, "I am going to provide a high-level overview of the system's components, and then we can delve into detail on the most important ones."
Choosing a System Architectures
When discussing the high level overview, it's important to consider the tradeoffs of different architectures and settle on one that best fits your needs. Here are some common architectures and their tradeoffs:
|Architecture||Pros||Cons||When to use|
|Monolithic Architecture||Simplicity : Easier to develop and deploy because all components are interwoven and deployed together. |
Consistency : Since there's only one codebase, it's easier to maintain consistency in terms of tools, libraries, and processes used.
Efficiency : Inter-component communication can be faster because all components reside in the same process.
|Scalability : Monolithic applications can be harder to scale because you often have to scale the entire application, rather than individual components. |
Reliability : If a single component fails, it can cause the entire application to fail.
Codebase Size : As the application grows, the codebase can become large and unwieldy, slowing down development and testing.
|Ideal for small-scale applications where the simplicity of design and development outweighs the potential limitations in scalability and resilience. Unlikely choice for a System Design Interview.|
|Microservices Architecture||Scalability : Each service is independent, so they can be scaled individually based on demand. |
Flexibility : Different services can use different technologies that best suit their requirements.
Isolation : Failure in one service doesn't directly affect the others.
|Complexity : This architecture can be more complex to develop, deploy, and manage due to the distribution of services. |
Communication : Inter-service communication is slower and more complicated compared to monolithic applications.
|Suited for large-scale applications where scalability, flexibility, and high availability are key requirements.|
|Serverless Architecture||Scalability : The infrastructure scales up and down automatically based on the demand. |
Cost-effectiveness : You only pay for the compute time you consume.
Reduced Operational Effort : Server management and capacity planning are handled by the cloud provider.
|Vendor Lock-in : Moving to a new cloud provider can be challenging due to the use of provider-specific features and services. |
Cold Start: There could be a delay in the execution of serverless functions after idle time, known as a cold start.
Debugging and Monitoring : Traditional debugging and monitoring tools don't usually work well with serverless architectures.
|Suitable for applications with unpredictable or highly variable workloads, and when you want to reduce operational efforts and costs.|
|Event-Driven Architecture||Scalability : Can handle high loads and spikes in traffic. |
Flexibility : Components are decoupled, promoting flexibility and evolution of individual parts.
Real-time : Can react to events in real-time.
|Complexity : Can be complex to set up and manage due to asynchronous communication and event handling. |
Debugging : Debugging and tracing through the system can be challenging.
|Suitable for real-time data processing, complex business processes, and microservices-based applications.|
|Service-Oriented Architecture (SOA)||Reusability : Services can be reused across different applications, improving development efficiency. |
Flexibility : Allows for loose coupling of services, enabling changes to be made to one service without affecting others.
|Performance : The overhead of service communication can potentially affect performance. |
Complexity : Designing, implementing, and managing a SOA can be complex.
|Ideal for business applications that need to reuse business functionality across different modules and integrate diverse systems.|
Each architecture has its unique trade-offs and is suited to different kinds of applications and business requirements. In a system design interview, understanding these trade-offs helps you choose the most suitable architecture for the system you're designing based on factors such as scale, performance, cost, and the specific use case.
It's quite possible that you choose a combination of architectures. For example, a system could leverage both microservices for its modular components and serverless for its unpredictable workloads or occasional tasks. Such a hybrid approach can harness the strengths of multiple architectures while offsetting their individual limitations. Keep in mind, in real-world applications, a one-size-fits-all methodology seldom applies. It's about understanding the nuances of each architectural paradigm and intelligently applying them based on the challenges and constraints at hand.
Head to the Whiteboard
Why Use a Whiteboard?
The act of sketching a system design on a whiteboard (or a virtual equivalent like Excalidraw) is a powerful way to visually represent complex architectures. It allows both interviewers and peers to follow along with your thought process, easily identify gaps or flaws, and foster collaborative discussions.
- Getting Started: The Blank Slate: Staring at an empty whiteboard can be intimidating. Begin by listing down the requirements and constraints that you've identified in the previous discussions. This becomes your reference point and ensures you remain aligned with the problem at hand.
- Top-Down Approach: Start high-level, focusing on the primary components of the system. For instance, a basic web application might have a Web UI, Application Server, Database, and maybe a Cache. Sketch these out in big blocks.
- Component Relationships: Once primary components are in place, start drawing relationships or flows. How does a user request travel through this system? Which components communicate with each other? Use arrows to indicate directionality.
- Zoom In: With the high-level design laid out, start zooming into each component. For instance, if you've identified the need for caching, where is the cache placed? Is it on the user's browser (client-side), or is it sitting in front of the database (server-side)?
- Scalability and Resilience: As the problem demands, introduce load balancers, replicas, or sharding strategies. Remember, any addition to the system should have a justification. Whether it's a load balancer to distribute incoming user traffic or introducing database replicas to allow for faster reads.
- Feedback Loops: One major advantage of the whiteboarding process is the feedback loop it creates. It’s not just a one-way flow; as you draw and explain, expect questions or suggestions. Use this as an opportunity to refine and iterate.
- Incorporate Common Components: As you dive deeper into the system, remember to refer to the list of common components from the cheat sheet below. Identify which ones are relevant to your current design and integrate them. Ensure each addition serves a clear purpose.
- Optimization and Trade-offs: As the design progresses, you might find opportunities for optimizations. Maybe certain components can be combined, or perhaps there's redundancy that can be eliminated. However, every decision usually comes with trade-offs. Be ready to discuss them. Why did you prioritize one approach over another?
- End with a Walkthrough: Once you believe your design is complete, do a walkthrough. Pretend you're a user request navigating through the system or simulate how the system would handle a spike in traffic. This often reveals overlooked details or bottlenecks.
- Final Touches: Remember to label every component clearly. If you introduced any new terminologies or concepts, have a small key or legend on the side for clarity.
The essential components of a system may vary depending on the specific requirements and use case. However, most systems you design will share the majority of these common components.
|User Interface||The user interface is the part of the system that users interact with. It's responsible for presenting information to the user and receiving input from them. Examples include web interfaces, mobile apps, and desktop applications.|
|Application Server||The application server is responsible for processing user requests and generating responses. It can handle business logic, data access, and other application-specific tasks. Examples include Apache Tomcat, Microsoft IIS, and Node.js.|
|Database||The database is responsible for storing and retrieving data as requested by other components. It's an essential part of most systems and can be implemented using various technologies such as MySQL, PostgreSQL, MongoDB, and Cassandra.|
|Cache||A cache is a temporary storage location that stores frequently accessed data to improve performance. It can be implemented using various technologies such as Redis, Memcached, and Hazelcast.|
|Load Balancer||A load balancer distributes incoming network traffic across multiple servers to improve performance and reliability. It can be implemented using various technologies such as HAProxy, Nginx, and Amazon ELB.|
|API||An API (Application Programming Interface) is a set of rules and protocols that allow different software applications to communicate with each other. It can be implemented using various technologies such as REST, SOAP, and GraphQL.|
Additional Important Components
There are also additional components which are not found in every system, but are crucial in certain use cases and are important to know.
|Component||Description||When to Use||Example|
|Object Storage||Manages data as objects, used for storing large or unstructured data.||When storing large amounts of unstructured data, such as multimedia files.||Amazon S3, Google Cloud Storage|
|Search Engine||Optimized for search operations over a large dataset.||When the application involves complex search queries or needs to handle large volumes of data.||Elasticsearch, Solr|
|Data Warehouse||Used for reporting and data analysis, it's a core component of business intelligence.||When analytics or insights from large datasets are needed.||Amazon Redshift, Google BigQuery|
|Authentication Server||Manages user authentication and implements protocols like OAuth2 or OpenID Connect.||When the system needs to securely manage user identities and sessions.||Auth0, Okta|
|DNS Server||Translates human-friendly domain names into machine-friendly IP addresses.||When setting up websites or web services accessible via domain names.||Google DNS, Amazon Route 53|
|Email Server||Manages and sends emails.||When your system needs to send emails as part of its functionality.||Sendgrid, Amazon SES|
|Task Scheduler||Manages the execution of jobs, tasks or scripts at predefined times or intervals.||When certain tasks in the system need to be executed periodically.||Cron, Google Cloud Scheduler|
|Log Management System||Collects and manages logs from various sources for auditing and debugging.||When you need centralized logging for easier debugging and auditing.||Splunk, ELK Stack|
|Web Proxy||Acts as an intermediary for requests from clients seeking resources from other servers.||When you want to control, optimize or secure outgoing requests from your system.||Nginx, Apache HTTP Server|
|API Gateway||Handles and manages all the API calls, takes care of functionalities like rate limiting, caching, and authentication.||When you have multiple services and you want to expose a single entry point.||Amazon API Gateway, Kong|
|File Server||Provides a location on a network where users can store and share files.||When you need a centralized location for storing files that can be accessed by multiple users or services.||Windows Server, Samba|
|Container Orchestration System||Automates the deployment, scaling, and management of containerized applications.||When you're using containers and need automation for managing them.||Kubernetes, Docker Swarm|
|Serverless Functions (FaaS)||Lets developers build and run applications without thinking about servers.||When you need to run small pieces of code without provisioning or managing servers.||AWS Lambda, Google Cloud Functions|
|Real-time Communication Server||Enables real-time communication functionalities like chat and video call.||When the system requires real-time communication between users.||Twilio, WebRTC|
|Rate Limiter||Limits the number of requests a client can make to a service in a particular amount of time.||When you need to protect your services from being overwhelmed by too many requests.||Redis + custom script, Nginx rate limiting|
|Notification Service||Enables the system to send notifications via various channels (like email, SMS, push notifications etc.).||When your application needs to notify users about updates, events, or other important information.||Amazon SNS, Firebase Cloud Messaging|
|Message Queue||Temporarily holds messages before they are processed, ensuring message delivery and decoupling services in a distributed architecture.||When you need to ensure reliable communication between different parts of a system, especially in a distributed environment.||RabbitMQ, Apache Kafka|
|Pub/Sub System||A messaging pattern where messages are published to a topic and subscribers to that topic receive the messages.||When implementing event-driven architectures or when multiple services need to be notified of events without creating tight couplings.||Google Cloud Pub/Sub, Apache Kafka|
|Distributed Cache||A high-speed data storage layer that stores a subset of data, typically transient in nature, to improve read access time.||When you need fast access to frequently read data and want to reduce the load on the primary database.||Redis, Memcached|
|Distributed Lock||Mechanism to ensure that multiple processes or threads do not access a shared resource concurrently in a distributed environment.||When you need to synchronize access to resources in a distributed system.||Apache ZooKeeper, Redis Lock|
|Content Delivery Network (CDN)||A network of servers used to distribute the delivery of content, typically used for large-scale web content distribution.||When you want to improve the performance and availability of web content to users across different geographical locations.||Cloudflare, Amazon CloudFront|
|Load Balancer||Distributes network or application traffic across multiple servers to ensure availability and reliability.||When you need to manage traffic efficiently and prevent any single server from becoming a bottleneck.||HAProxy, AWS Elastic Load Balancing|
|Data Streaming Platform||Allows for the continuous processing and streaming of large volumes of data.||When dealing with real-time data processing and analytics.||Apache Kafka, Amazon Kinesis|
Of course, this list is not exhaustive. There can be many other components based on the specific requirements of a system. It's important to evaluate each one based on your needs before deciding to use them in your architecture.
Here is an example high-level design for our Twitter example:
Defining the API
The Application Programming Interface (API) is a set of protocols that allows different software applications to communicate with each other. It defines the methods and data formats that a client application can use to interact with the system.
Defining the API is an integral part of designing any system as it serves as the contract between the client and the server. To define the API for a system, consider the following steps:
- Identify the API type: APIs can be broadly classified into REST, SOAP, or GraphQL, each with its own merits and demerits. REST is generally used due to its simplicity and wide adoption, SOAP is more suited for enterprise applications with high security and transactional requirements, and GraphQL offers the flexibility to the client in data fetching.
- Determine the resources: Identify the key entities or resources in your system. These are typically the nouns in your system, like User, Message, ChatRoom in a chat application.
- Define the endpoints: For each resource, define the various actions that can be performed. These correspond to different HTTP methods in a REST API, such as GET for retrieving data, POST for creating data, PUT for updating data, and DELETE for removing data.
- Specify the request/response structure: Each endpoint should clearly define the request structure (including path parameters, query parameters, and the request body) and the response structure (including the status code, headers, and response body).
- Manage Errors: Define how your API will handle and report errors. This could be HTTP status codes (like 404 for resource not found, 500 for server errors) and/or specific error messages in the response body.
Here are some example of an API endpoint:
When designing APIs, it's important to consider the principles of good API design such as consistency, simplicity, security, and extensibility. It is also a good practice to follow established conventions and standards like RESTful principles when designing your APIs.