Common Problems
Design Tweet Search
Understanding the Problem
Functional Requirements
Core Requirements
- Users should be able to search for tweets by keyword.
- Users should be able to sort by either Popularity (likes) or Chronologically (time).
- Users should be able to access results in pages, up to 100.
Below the line (out of scope)
- Support fuzzy matching on terms (e.g. search for âbirdâ matches âostrichâ).
- Personalization in search results (i.e. the results depend on details of the user searching).
- Sophisticated relevance algorithms for ranking.
- Images and media.
- Realtime updates to the search page as new tweets come in.
Non-Functional Requirements
Core Requirements
- The system must be fast, median queries should return in < 500ms.
- The system must support a high volume of requests.
- New tweets must be searchable in a short amount of time, < 30s.
- All tweets must be discoverable, especially old or unpopular tweets may take longer to return.
- The system should be highly available.
Below the line (out of scope)
- Privacy rules (e.g. private accounts, blocking, etc).
- The system will not be responsible for the creation or storing of the tweets themselves, only supporting search functionality. Weâll assume a âtweet serviceâ exists to access them.
Hereâs how these might be shorthanded in an interview. Note that out-of-scope requirements usually stem from questions like âdo we need to handle privacy?â. Interviewers are usually comfortable with you making assertions âIâm going to leave privacy out of scope for the startâ and will correct you if needed.
Scale estimations
Weâll assume Twitter has 100M users. On average each user produces 1 tweet per day (maybe 20% do 5 and 80% do 0) and Likes 10 tweets. For ease of calculation, weâll assume 100k seconds in a day.
First letâs look at the volume of writes:
Tweets created: 100M * 1 tweet/day / (100k seconds/day) = 1k tweets/second Likes created: 100M * 10 likes/day / (100k seconds/day) = 10k likes/second
Noteworthy here is that the number of likes is significantly more than the number of tweet creations. Now letâs look at reads.
Searches: 100M * 1 search/day / (100k seconds/day) = 1k searches/second
While this is a lot (and may burst the 10x this value or more), our system is write-heavy vs read-heavy.
Next letâs evaluate the storage requirements of the system. First letâs assume Twitter is 10 years old and that the full tweet metadata is 1kb (probably an overestimate).
Tweets searchable: 100M tweets/day * 365 days/year * 10 years = 365B tweets Raw size: 36B tweets * 1kb/tweet = 365 TB
Thatâs a lot of storage but not a crazy amount by todayâs standards. For reference, as of late 2023 there is an AWS instance 24TB of memory and this data would cost about $5,000 to store in Cloudflareâs blob storage, R2.
For this particular problem, doing these estimations helps us to determine the nature of the design. Reflections like those above help your interviewer understand how your estimations are affecting your design and demonstrate your experience - donât estimate for the sake of estimation!
Planning the Approach
As we progress in the interview, we'll want to refer back to the requirements to ensure that we're covering everything we established we need to cover via our requirements. But to get started with this problem, it helps to first build a backbone which satisfies 80% of the requirements.
To do this, weâll need to solve 3 key challenges:
- Indexing our tweets so we can quickly search for them by keyword.
- Handling the large volume of writes to our system.
- How to deal with pagination.
High-Level Design
For tweet search, the data model and API is fairly simple: our fundamental unit is tweets! The complexity of this problem is in how the data is arranged and queried.
The API for search can be similarly simple:
Using GET here allows us to use a standard Content Delivery Network or CDN to cache popular queries at the edge, using standard HTTP cache control headers, minimizing latency in those instances.
1) Indexing our tweets so we can quickly search for them by keyword.
If we need to search through all 36TB of tweets on a single machine for every search, weâre going to have a bad time. As a starting point, grep with a fast SSD might run about 1gb/s, 36k seconds or about 10 hours on a single machine. Even if we spread this around 36k machines, itâs still too high for our requirements.
The first aha for an experienced engineer is this is a place where we can utilize a reverse index. If we have a map from keyword to tweets that map it, we can do a very fast lookup to get to the data we need vs having to scan through all tweets in our archive.
Bad Solution: SQL without fulltext search or using the source Tweet database.
Good Solution: Build a reverse index from scratch.
Great Solution: Use an off-the-shelf solution like ElasticSearch or a fulltext index on a relational database.
2) Handling the large volume of writes to our system.
The write volume for our system is large, but itâs made even more difficult by the need to index like counts for every tweet (which 10xâs our volume). This is a low-hanging opportunity to optimize our system.
Bad Solution: Simple horizontal scaling.
Good Solution: Batch likes before writing to the database
Great Solution: Two stage architecture.
3) How to deal with pagination.
Users may want to look deeper into the results than the first N tweets that we return to them. When they do this, we need to ensure weâre (a) not creating unnecessary extra load on our database, and (b) not leaving out tweets which would be displayed if we simply returned the full set of results without pages.
Good Solution: Offset pagination with result cache
Good Solution: Keyset pagination for chronologically sorted results
Putting it all together
Our full design might look like this:
We have two paths through our service: the write path and the query path. Via the write path we use an Ingestion Service to cut down the volume of events that we need to write to our backend ElasticSearch instance. On the query path we have 2 layers of caching, at the request level and in front of our elasticsearch instance. A re-ranking service grabs up-to-date like counts before returning to the user as whatâs written to elasticsearch may be (intentionally) stale.
What is Expected at Each Level?
Thereâs a lot of meat to this question! Your interviewer may even have you go deeper on specific sections. What might you expect in an actual assessment?
Mid-Level
Breadth vs. Depth: A mid-level candidate will be mostly focused on breadth. As an approximation, youâll show 80% breadth and 20% depth in your knowledge. You should be able to craft a high-level design that meets the functional requirements you've defined, but the optimality of your solution will be icing on top rather than the focus. Probing the Basics: Your interviewer will spend some time probing the basics to confirm that you know what each component in your system does. For example, if you use ElasticSearch, expect to be asked what the structure of your document is, or a brief explanation of how it works. Your interviewer will not be taking anything for granted with respect to your knowledge. Mixture of Driving and Taking the Backseat: You should drive the early stages of the interview in particular, but your interviewer won't expect that you are able to proactively recognize problems in your design with high precision. Because of this, itâs reasonable that they take over and drive the later stages of the interview while probing your design. The Bar for Tweet Search: For this question, interviewers expect a mid-level candidate to have clearly defined the API endpoints and data model, and created both the sides of the system: ingestion and query. In instances where the candidate uses a âBadâ solution, the interviewer will expect a good discussion but not that the candidate immediately jumps to a great (or sometimes even good) solution.
Senior
Depth of Expertise: As a senior candidate, your interviewer expects a shift towards more in-depth knowledge â about 60% breadth and 40% depth. This means you should be able to go into technical details in areas where you have hands-on experience. Advanced System Design: You should be familiar with advanced system design principles. Certain aspects of this problem should jump out to experienced engineers (write volume, lots of duplicate inputs) and your interviewer will be expecting you to have reasonable solutions. Articulating Architectural Decisions: Your interviewer will want you to clearly articulate the pros and cons of different architectural choices, especially how they impact scalability, performance, and maintainability. You should be able to justify your decisions and explain the trade-offs involved in your design choices. Problem-Solving and Proactivity: You should demonstrate strong problem-solving skills and a proactive approach. This includes anticipating potential challenges in your designs and suggesting improvements. You need to be adept at identifying and addressing bottlenecks, optimizing performance, and ensuring system reliability. The Bar for Tweet Search: For this question, a senior candidate is expected to speed through the initial high level design so we can spend time discussing, in detail, how to optimize the critical paths. At a bare minimum the solution should have both ingestion and query detailed, with a proper reverse index, and appropriate caching strategy.
Staff+
Emphasis on Depth: As a staff+ candidate, the expectation is a deep dive into the nuances of system design â the interviewer is looking for about 40% breadth and 60% depth in your understanding. This level is all about demonstrating that "been there, done that" expertise. You should know which technologies to use, not just in theory but in practice, and be able to draw from your past experiences to explain how theyâd be applied to solve specific problems effectively. Your interviewer knows you know the small stuff (REST API, data normalization, etc) so you can breeze through that at a high level so we have time to get into what is interesting. High Degree of Proactivity: At this level, your interviewer expects an exceptional degree of proactivity. You should be able to identify and solve issues independently, demonstrating a strong ability to recognize and address the core challenges in system design. This involves not just responding to problems as they arise but anticipating them and implementing preemptive solutions. Practical Application of Technology: You should be well-versed in the practical application of various technologies. Your experience should guide the conversation, showing a clear understanding of how different tools and systems can be configured in real-world scenarios to meet specific requirements. Complex Problem-Solving and Decision-Making: Your problem-solving skills should be top-notch. This means not only being able to tackle complex technical challenges but also making informed decisions that consider various factors such as scalability, performance, reliability, and maintenance. Advanced System Design and Scalability: Your approach to system design should be advanced, focusing on scalability and reliability, especially under high load conditions. This includes a thorough understanding of distributed systems, load balancing, caching strategies, and other advanced concepts necessary for building robust, scalable systems. The Bar for Tweet Search: For a staff+ candidate, expectations are set high regarding depth and quality of solutions, particularly for the complex scenarios discussed earlier. Your interviewer will be looking for you to be diving deep into at least 2-3 key areas, showcasing not just proficiency but also innovative thinking and optimal solution-finding abilities. A crucial indicator of a staff+ candidate's caliber is the level of insight and knowledge they bring to the table. A good measure for this is if the interviewer comes away from the discussion having gained new understanding or perspectives.
Loading comments...