AI Inference Request Batching
Given a list of AI inference requests with arrival times and token counts, implement a batching strategy that groups requests to maximize throughput while respecting constraints on batch size, token limits, and per-request SLA wait times.
Asked at:
Microsoft
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Early April, 2026
Microsoft
Senior
Scenario: You own an AI inference endpoint. To reduce cost and improve throughput, requests can be batched. However, batching increases latency, so you must respect an SLA (Service Level Agreement). Problem Statement: You are given a list of inference requests. Each request has the following attributes: id (string) arrivalTimeMs (long) tokens (int) You want to create batches to send to a model. Batch Rules: A batch must satisfy all constraints: Max tokens per batch: The sum of tokens in the batch should not exceed maxBatchTokens. Max requests per batch: The number of requests in a batch should not exceed maxBatchSize. SLA: Each request must start processing no later than arrivalTimeMs + maxWaitMs. Processing time: Each batch takes a fixed time, batchProcessMs, (independent of its size) once started. A single worker can process only one batch at a time. Task: Implement the function: List<Batch> createBatches(requests, maxBatchSize, maxBatchTokens, maxWaitMs, batchProcessMs) Where each Batch includes: startTimeMs list of request ids The output must respect all constraints and be valid for all inputs.
Hello Interview Premium
Your account is free and you can post anonymously if you choose.