Prepare for the AWS Certified AI Practitioner Exam with flashcards and multiple choice questions. Each question includes hints and explanations to help you succeed on your test. Get ready for certification!

Practice this question and more.


Which action is typically taken to improve class distribution in a training dataset?

  1. A. Increase the sample size of underrepresented classes

  2. B. Limit the dataset to only the most common classes

  3. C. Use simpler algorithms

  4. D. Remove outlier data

The correct answer is: A. Increase the sample size of underrepresented classes

Increasing the sample size of underrepresented classes is an effective strategy to improve class distribution in a training dataset. This approach addresses the issue of class imbalance, where certain classes have significantly fewer examples than others. By adding more data points for the underrepresented classes, the model is provided with a more balanced view of the overall distribution, which can lead to better generalization and performance on unseen data. When the dataset is balanced, machine learning algorithms can learn the characteristics of all classes more effectively, reducing bias toward the majority class. This is crucial, particularly in classification tasks, where the model may otherwise perform poorly on minority classes if they are not adequately represented during training. In contrast, other options like limiting the dataset to only the most common classes would exacerbate the problem of imbalance. Using simpler algorithms doesn't directly address class distribution and might not improve the model's ability to learn from the data effectively. Lastly, removing outlier data could unintentional lead to loss of important information, especially if those outliers belong to underrepresented classes. Thus, increasing the sample size of underrepresented classes is the most effective and appropriate action to improve class distribution.