Prepare for the AWS Certified AI Practitioner Exam with flashcards and multiple choice questions. Each question includes hints and explanations to help you succeed on your test. Get ready for certification!

Practice this question and more.


Which SageMaker inference option is suitable for a company needing near real-time latency with large data inputs?

  1. Real-time inference

  2. Serverless inference

  3. Asynchronous inference

  4. Batch transform

The correct answer is: Asynchronous inference

The option that best fits the requirement for near real-time latency with large data inputs is real-time inference. This approach enables you to perform inference requests and receive responses quickly, making it suitable for applications where low latency is critical. Real-time inference allows for a direct interaction between the application and the SageMaker model endpoint, enabling responses to be obtained in milliseconds, which is essential for near real-time scenarios. The other choices serve different purposes: serverless inference is designed to auto-scale based on demand but may not guarantee the same level of low-latency response required for real-time applications. Asynchronous inference, while beneficial for handling large datasets and complex processing tasks, can introduce delays since it processes requests in the background and requires triggering a callback once the processing is complete. Batch transform is intended for scenarios where high throughput is prioritized over latency, as it processes large datasets in bulk rather than on a per-request basis, leading to longer processing times. Therefore, real-time inference is the appropriate approach for the company’s need for handling large data inputs while maintaining the low latency required for near real-time requirements.