Limited Time Offer:Up to 20% off Hello Interview Premium
Up to 20% off Hello Interview Premium 🎉
Hello Interview
Your Dashboard
System Design
Code
Low Level Design
Behavioral
AI Coding
New
ML System Design
Salary Negotiation
Interview Guides
Blog
System Design
Low Level Design
AI Coding
Behavioral
New
Interview Questions
Success Stories
System Design
Low-Level Design
New
Ask The Community
Discord
Mock Interviews
1:1 Mentorship
Refer a Friend
Pricing
Sign in / Sign up
Search
⌘K
Pricing

Tutor

AI Inference Request Batching

Given a list of AI inference requests with arrival times and token counts, implement a batching strategy that groups requests to maximize throughput while respecting constraints on batch size, token limits, and per-request SLA wait times.

Asked at:

Microsoft

Microsoft


Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Company
​
Level
All Regions
Region

Early April, 2026

Microsoft

Microsoft

Senior

Scenario: You own an AI inference endpoint. To reduce cost and improve throughput, requests can be batched. However, batching increases latency, so you must respect an SLA (Service Level Agreement). Problem Statement: You are given a list of inference requests. Each request has the following attributes: id (string) arrivalTimeMs (long) tokens (int) You want to create batches to send to a model. Batch Rules: A batch must satisfy all constraints: Max tokens per batch: The sum of tokens in the batch should not exceed maxBatchTokens. Max requests per batch: The number of requests in a batch should not exceed maxBatchSize. SLA: Each request must start processing no later than arrivalTimeMs + maxWaitMs. Processing time: Each batch takes a fixed time, batchProcessMs, (independent of its size) once started. A single worker can process only one batch at a time. Task: Implement the function: List<Batch> createBatches(requests, maxBatchSize, maxBatchTokens, maxWaitMs, batchProcessMs) Where each Batch includes: startTimeMs list of request ids The output must respect all constraints and be valid for all inputs.

Your account is free and you can post anonymously if you choose.

Hello Interview Premium

Recent interview questions
System Design Guided Practice
Exclusive content
Learn More
Questions
Meta SWE Interview QuestionsAmazon SWE Interview QuestionsGoogle SWE Interview QuestionsOpenAI SWE Interview QuestionsEngineering Manager (EM) Interview Questions
Learn
Learn System DesignLearn DSALearn BehavioralLearn ML System DesignLearn Low Level DesignGuided Practice
Links
FAQPricingGift PremiumHello Interview Premium
Legal
Terms and ConditionsPrivacy PolicySecurity
Contact
About UsProduct Support

7511 Greenwood Ave North Unit #4238 Seattle WA 98103


© 2026 Optick Labs Inc. All rights reserved.